Soumitra Sulav created HDDS-6584:
------------------------------------
Summary: [spark] Spark-HWC Error log with AcidUtils
Key: HDDS-6584
URL: https://issues.apache.org/jira/browse/HDDS-6584
Project: Apache Ozone
Issue Type: Bug
Components: build
Affects Versions: 1.3.0
Reporter: Soumitra Sulav
Attachments: spark-hwc-aicderror-debug.log,
spark-hwc-aicderror-info.log
AcidUtils error messages are observed with Spark HiveWarehouseConnector with
OzoneFilesystem.
The job doesn't abort but this might lead to issues in acid scenarios.
*Test:* TPCDS queries are run via spark-hwc on the ozone filesystem.
Table Info under query
{code:java}
|# Detailed Table Information|
|Database |tpcds_src
|Table |store_sales
|Owner |hrt_qa
|Created Time |Fri Mar 04 14:48:15 UTC 2022
|Last Access |Thu Jan 01 00:00:00 UTC 1970
|Created By |Spark 2.2 or prior
|Type |EXTERNAL
|Provider |hive
|Table Properties |[numFilesErasureCoded=0, bucketing_version=2,
transient_lastDdlTime=1646405295]
|Statistics |388445409 bytes
|Location
|o3fs://hivetest.ozonestage.ozone1/user/hrt_qa/tpcds/tests/data/store_sales
|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
|InputFormat |org.apache.hadoop.mapred.TextInputFormat
|OutputFormat |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
|Storage Properties |[serialization.format=|, field.delim=|]
|Partition Provider |Catalog {code}
*Info Logs :*
{code:java}
# spark-shell --jars
/opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-114.jar
--master yarn --deploy-mode client --conf spark.sql.broadcastTimeout=1000
--conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 --conf
spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf
spark.driver.memory=15g --conf spark.network.timeout=1000s --conf
spark.sql.crossJoin.enabled=true --conf spark.eventLog.enabled=false --conf
spark.sql.hive.hiveserver2.jdbc.url.principal=hive/[email protected]
--conf spark.executor.memory=2g --conf
spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator
--conf spark.driver.log.persistToDfs.enabled=false --conf
spark.security.credentials.hiveserver2.enabled=true --name "PySparkShellT"
{code}
{code:java}
scala> spark.sql("SELECT * FROM ( SELECT i_category, i_class, i_brand,
s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales,
avg(sum(ss_sales_price)) OVER (PARTITION BY i_category, i_brand, s_store_name,
s_company_name) avg_monthly_sales FROM item, store_sales, date_dim, store WHERE
ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk AND ss_store_sk =
s_store_sk AND d_year IN (1999) AND ((i_category IN ('Books', 'Electronics',
'Sports') AND i_class IN ('computers', 'stereo', 'football')) OR (i_category IN
('Men', 'Jewelry', 'Women') AND i_class IN ('shirts', 'birdal', 'dresses')))
GROUP BY i_category, i_class, i_brand, s_store_name, s_company_name, d_moy)
tmp1 WHERE CASE WHEN (avg_monthly_sales <> 0) THEN (abs(sum_sales -
avg_monthly_sales) / avg_monthly_sales) ELSE NULL END > 0.1 ORDER BY sum_sales
- avg_monthly_sales, s_store_name LIMIT 100").show()
22/03/07 12:45:09 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
Hive Session ID = 500e8d1c-0481-4dd8-96ce-7040f9ebea0f
22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for
reading
22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for
reading
22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for
reading
22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension for
reading
22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does
not exist
22/03/07 12:45:15 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:15 WARN impl.MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem
22/03/07 12:45:17 ERROR io.AcidUtils: Failed to get files with ID; using
regular API: Only supported for DFS; got class
org.apache.hadoop.fs.ozone.OzoneFileSystem {code}
Attached are [^spark-hwc-aicderror-debug.log]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]