[GitHub] [kyuubi] yyh2954360585 opened a new issue, #4885: [Bug] The configuration of kdiubi SparkSQL query engine setting Hudi Schema Evolution has not taken effect

via GitHub Thu, 25 May 2023 00:08:48 -0700


yyh2954360585 opened a new issue, #4885:
URL: https://github.com/apache/kyuubi/issues/4885


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   Spark version:3.2.3
   Hudi version:0.13.0
   
   **desc**:Connect to the SparkSQL query engine through Kyuubi, expose the 
service using Hive JDBC Driver, and delete Hudi table fields using Hudi Schema 
Evolution. The error message is: DROP COLUMN is only supported with v2 tables, 
But I have no problem using the Hudi Schema Evolution feature through SparkSQL 
to delete fields from the Hudi table.
   If using the Hudi Schema Evolution feature, two configurations need to be 
set:
   `set hoodie.schema.on.read.enable=true;`
   `set hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true;`
   
   If the configuration is not set in SparkSQL, deleting the Hudi table field 
will also prompt the same error message.
   So it seems that the configuration I set when using Kyuubi on Spark 
JdbcDriver does not take effect
   
   **Using SparkSQL operations**:
   
![221075d04a4715085c8335610663482](https://github.com/apache/kyuubi/assets/65161474/08c2d06a-19e7-4d39-a7bf-1efcdd4212aa)
   
![3b59f02badb716e620bd8e8885c4dde](https://github.com/apache/kyuubi/assets/65161474/307352f5-3fb9-4608-a3e2-586573ee2213)
   
   **Using Kyuubi operations**:
   
![dd74945f9e6c18a0e83de5e10065f2c](https://github.com/apache/kyuubi/assets/65161474/2766ca90-f093-4755-b48e-970f2ffa1e20)
   
![f21d6345236da15cec9152c667e5d1f](https://github.com/apache/kyuubi/assets/65161474/330b60b5-a4a9-425a-b722-c4bcbc27bdba)
   
   
   
   ### Affects Version(s)
   
   1.6.0
   
   ### Kyuubi Server Log Output
   
   _No response_
   
   ### Kyuubi Engine Log Output
   
   _No response_
   
   ### Kyuubi Server Configurations
   
   ```yaml
   #
   # Licensed to the Apache Software Foundation (ASF) under one or more
   # contributor license agreements.  See the NOTICE file distributed with
   # this work for additional information regarding copyright ownership.
   # The ASF licenses this file to You under the Apache License, Version 2.0
   # (the "License"); you may not use this file except in compliance with
   # the License.  You may obtain a copy of the License at
   #
   #    http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing, software
   # distributed under the License is distributed on an "AS IS" BASIS,
   # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   # See the License for the specific language governing permissions and
   # limitations under the License.
   #
   
   ## Kyuubi Configurations
   
   #
   # kyuubi.authentication           NONE
   # kyuubi.frontend.bind.host       localhost
    kyuubi.frontend.bind.port       8788
   # HA
   kyuubi.ha.zookeeper.quorum  xxx:2181,xxx:2181,xxx:2181
   
   #connection pool
   kyuubi.frontend.thrift.max.worker.threads 500000
   kyuubi.frontend.mysql.max.worker.threads  500000
   
   # share
   kyuubi.engine.share.level USER
   #kyuubi.engine.single.spark.session true
   #kyuubi.engine.share.level SERVER
   spark.dynamicAllocation.enabled=true
   ##false if perfer shuffle tracking than ESS
   spark.shuffle.service.enabled=true
   spark.dynamicAllocation.initialExecutors=5
   spark.dynamicAllocation.minExecutors=5
   spark.dynamicAllocation.maxExecutors=500
   spark.dynamicAllocation.executorAllocationRatio=0.5
   spark.dynamicAllocation.executorIdleTimeout=60s
   spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
   ## true if perfer shuffle tracking than ESS
   spark.dynamicAllocation.shuffleTracking.enabled=false
   spark.dynamicAllocation.shuffleTracking.timeout=30min
   spark.dynamicAllocation.schedulerBacklogTimeout=1s
   spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
   spark.cleaner.periodicGC.interval=5min
   
   # Monitoring on prometheus
   kyuubi.metrics.reporters PROMETHEUS 
   kyuubi.metrics.prometheus.port 10019
   kyuubi.metrics.prometheus.path /metrics
   
   #JDBC Authentication
   kyuubi.authentication=JDBC
   kyuubi.authentication.jdbc.driver.class = com.mysql.jdbc.Driver
   kyuubi.authentication.jdbc.url = jdbc:mysql://xx.xx.xx.xx:3306/kyuubi
   kyuubi.authentication.jdbc.user =xxx
   kyuubi.authentication.jdbc.password =xxx
   kyuubi.authentication.jdbc.query = SELECT 1 FROM t_kyuubi_user WHERE 
user=${user} AND passwd=md5(${password})
   
   # Spark Configurations
   spark.master yarn
   spark.yarn.jars=hdfs://mycluster/spark-jars/*.jar
   spark.executor.memory 5G
   spark.executor.cores 3
   spark.executor.heartbeatInterval 200000
   spark.network.timeout 300000
   #spark.dynamicAllocation.enabled true
   #spark.dynamicAllocation.minExecutors 0
   #spark.dynamicAllocation.maxExecutors 20
   #spark.dynamicAllocation.executorIdleTimeout 60
   spark.submit.deployMode cluster
   spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
   spark.serializer=org.apache.spark.serializer.KryoSerializer
   
spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
   spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED
   spark.kryoserializer.buffer.max=512
   spark.hadoop.hive.exec.dynamic.partition=true
   spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
   
   spark.hoodie.schema.on.read.enable=true
   spark.hoodie.datasource.write.reconcile.schema=true
   spark.hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true
   # Details in https://kyuubi.apache.org/docs/latest/deployment/settings.html
   ```
   
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   #
   # Licensed to the Apache Software Foundation (ASF) under one or more
   # contributor license agreements.  See the NOTICE file distributed with
   # this work for additional information regarding copyright ownership.
   # The ASF licenses this file to You under the Apache License, Version 2.0
   # (the "License"); you may not use this file except in compliance with
   # the License.  You may obtain a copy of the License at
   #
   #    http://www.apache.org/licenses/LICENSE-2.0
   #
   # Unless required by applicable law or agreed to in writing, software
   # distributed under the License is distributed on an "AS IS" BASIS,
   # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   # See the License for the specific language governing permissions and
   # limitations under the License.
   #
   
   # Default system properties included when running spark-submit.
   # This is useful for setting default environmental settings.
   
   # Example:
   # spark.master                     spark://master:7077
     spark.master                     yarn
     spark.eventLog.enabled           true
     spark.eventLog.dir               hdfs://mycluster/spark-logs
     spark.eventLog.compress          true
     spark.executor.logs.rolling.maxSize     10000000
     spark.executor.logs.rolling.maxRetainedFiles 10
     spark.yarn.jars=hdfs://mycluster/spark-jars/*.jar
     spark.driver.extraClassPath /opt/module/spark-3.2.3/external_jars/*
     spark.executor.extraClassPath /opt/module/spark-3.2.3/external_jars/*
   # spark.serializer                 org.apache.spark.serializer.KryoSerializer
   # spark.driver.memory              5g
   # spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
   spark.yarn.historyServer.address=xxxx:18080
   spark.history.ui.port=18080
   
   # HUDI_CONF
   spark.serializer                  org.apache.spark.serializer.KryoSerializer
   spark.sql.catalog.spark_catalog 
org.apache.spark.sql.hudi.catalog.HoodieCatalog
   spark.sql.extensions              
org.apache.spark.sql.hudi.HoodieSparkSessionExtension,org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension
   #spark.sql.warehouse.dir           file:///tmp/hudi-bundles/hive/warehouse
   spark.sql.warehouse.dir           hdfs://xxx:8020/user/hive/warehouse
   spark.default.parallelism         8
   spark.sql.shuffle.partitions      8
   spark.sql.parquet.datetimeRebaseModeInRead CORRECTED
   
   #spark optimize
   
   spark.kryoserializer.buffer.max=254
   spark.executor.memory 3G
   spark.executor.cores 3
   spark.executor.heartbeatInterval 200000
   spark.network.timeout 300000
   spark.driver.cores=2
   spark.driver.memory=3g
   spark.driver.maxResultSize 2g
   spark.dynamicAllocation.enabled=true
   ##false if perfer shuffle tracking than ESS
   spark.shuffle.service.enabled=true
   spark.dynamicAllocation.initialExecutors=3
   spark.dynamicAllocation.minExecutors=3
   spark.dynamicAllocation.maxExecutors=500
   spark.dynamicAllocation.executorAllocationRatio=0.5
   spark.dynamicAllocation.executorIdleTimeout=60s
   spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
   ### true if perfer shuffle tracking than ESS
   spark.dynamicAllocation.shuffleTracking.enabled=false
   spark.dynamicAllocation.shuffleTracking.timeout=30min
   spark.dynamicAllocation.schedulerBacklogTimeout=1s
   spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
   spark.cleaner.periodicGC.interval=5min
   #hadoop ha
   spark.hadoop.user.name=xxx
   spark.hadoop.fs.defaultFS=hdfs://mycluster
   ```
   
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [X] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [kyuubi] yyh2954360585 opened a new issue, #4885: [Bug] The configuration of kdiubi SparkSQL query engine setting Hudi Schema Evolution has not taken effect

Reply via email to