[jira] [Reopened] (HDDS-6584) [spark] Spark-HWC Error log with AcidUtils

Arpit Agarwal (Jira) Wed, 31 May 2023 06:50:05 -0700


     [ 
https://issues.apache.org/jira/browse/HDDS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arpit Agarwal reopened HDDS-6584:
---------------------------------

> [spark] Spark-HWC Error log with AcidUtils
> ------------------------------------------
>
>                 Key: HDDS-6584
>                 URL: https://issues.apache.org/jira/browse/HDDS-6584
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 1.3.0
>            Reporter: Soumitra Sulav
>            Priority: Major
>         Attachments: spark-hwc-aicderror-debug.log, 
> spark-hwc-aicderror-info.log
>
>
> AcidUtils error messages are observed with Spark HiveWarehouseConnector with 
> OzoneFilesystem.
> The job doesn't abort but this might lead to issues in acid scenarios.
> *Test:* TPCDS queries are run via spark-hwc on the ozone filesystem.
> Table Info under query
> {code:java}
> |# Detailed Table Information|
> |Database                    |tpcds_src
> |Table                       |store_sales
> |Owner                       |hrt_qa
> |Created Time                |Fri Mar 04 14:48:15 UTC 2022
> |Last Access                 |Thu Jan 01 00:00:00 UTC 1970
> |Created By                  |Spark 2.2 or prior
> |Type                        |EXTERNAL
> |Provider                    |hive
> |Table Properties            |[numFilesErasureCoded=0, bucketing_version=2, 
> transient_lastDdlTime=1646405295]
> |Statistics                  |388445409 bytes
> |Location  
> |o3fs://hivetest.ozonestage.ozone1/user/hrt_qa/tpcds/tests/data/store_sales
> |Serde Library          |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> |InputFormat            |org.apache.hadoop.mapred.TextInputFormat
> |OutputFormat  |org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> |Storage Properties          |[serialization.format=|, field.delim=|]
> |Partition Provider          |Catalog {code}
> *Info Logs :*
> {code:java}
> # spark-shell --jars 
> /opt/cloudera/parcels/CDH/lib/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.7.1.7.1000-114.jar
>  --master yarn --deploy-mode client --conf spark.sql.broadcastTimeout=1000 
> --conf spark.datasource.hive.warehouse.read.mode=DIRECT_READER_V2 --conf 
> spark.sql.extensions=com.hortonworks.spark.sql.rule.Extensions --conf 
> spark.driver.memory=15g --conf spark.network.timeout=1000s --conf 
> spark.sql.crossJoin.enabled=true --conf spark.eventLog.enabled=false --conf 
> spark.sql.hive.hiveserver2.jdbc.url.principal=hive/[email protected]
>  --conf spark.executor.memory=2g --conf 
> spark.kryo.registrator=com.qubole.spark.hiveacid.util.HiveAcidKyroRegistrator 
> --conf spark.driver.log.persistToDfs.enabled=false --conf 
> spark.security.credentials.hiveserver2.enabled=true --name "PySparkShellT" 
> {code}
> {code:java}
> scala> spark.sql("SELECT * FROM ( SELECT i_category, i_class, i_brand, 
> s_store_name, s_company_name, d_moy, sum(ss_sales_price) sum_sales, 
> avg(sum(ss_sales_price)) OVER (PARTITION BY i_category, i_brand, 
> s_store_name, s_company_name) avg_monthly_sales FROM item, store_sales, 
> date_dim, store WHERE ss_item_sk = i_item_sk AND ss_sold_date_sk = d_date_sk 
> AND ss_store_sk = s_store_sk AND d_year IN (1999) AND ((i_category IN 
> ('Books', 'Electronics', 'Sports') AND i_class IN ('computers', 'stereo', 
> 'football')) OR (i_category IN ('Men', 'Jewelry', 'Women') AND i_class IN 
> ('shirts', 'birdal', 'dresses'))) GROUP BY i_category, i_class, i_brand, 
> s_store_name, s_company_name, d_moy) tmp1 WHERE CASE WHEN (avg_monthly_sales 
> <> 0) THEN (abs(sum_sales - avg_monthly_sales) / avg_monthly_sales) ELSE NULL 
> END > 0.1 ORDER BY sum_sales - avg_monthly_sales, s_store_name LIMIT 
> 100").show()
> 22/03/07 12:45:09 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> Hive Session ID = 500e8d1c-0481-4dd8-96ce-7040f9ebea0f
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension 
> for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:10 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension 
> for reading
> 22/03/07 12:45:10 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension 
> for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:11 INFO rule.HWCSwitchRule: using DIRECT_READER_V2 extension 
> for reading
> 22/03/07 12:45:11 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:12 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:13 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:14 WARN conf.HiveConf: HiveConf of name hive.masking.algo does 
> not exist
> 22/03/07 12:45:15 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:15 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-xceiverclientmetrics.properties,hadoop-metrics2.properties
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:16 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem
> 22/03/07 12:45:17 ERROR io.AcidUtils: Failed to get files with ID; using 
> regular API: Only supported for DFS; got class 
> org.apache.hadoop.fs.ozone.OzoneFileSystem {code}
> Attached are [^spark-hwc-aicderror-debug.log]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Reopened] (HDDS-6584) [spark] Spark-HWC Error log with AcidUtils

Reply via email to