[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

ASF GitHub Bot (Jira) Wed, 07 Jul 2021 07:28:08 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17376607#comment-17376607
 ]


ASF GitHub Bot commented on HUDI-2045:
--------------------------------------

yanghua commented on a change in pull request #3120:
URL: https://github.com/apache/hudi/pull/3120#discussion_r665350394



##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala
##########
@@ -36,24 +35,21 @@ import org.apache.hudi.common.util.{CommitUtils, 
ReflectionUtils}
 import org.apache.hudi.config.HoodieBootstrapConfig.{BOOTSTRAP_BASE_PATH_PROP, 
BOOTSTRAP_INDEX_CLASS_PROP}
 import org.apache.hudi.config.HoodieWriteConfig
 import org.apache.hudi.exception.HoodieException
-import org.apache.hudi.hive.util.ConfigUtils
 import org.apache.hudi.hive.{HiveSyncConfig, HiveSyncTool}
 import org.apache.hudi.internal.DataSourceInternalWriterHelper
-import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory
 import org.apache.hudi.sync.common.AbstractSyncTool
 import org.apache.log4j.LogManager
 import org.apache.spark.SPARK_VERSION
 import org.apache.spark.SparkContext
 import org.apache.spark.api.java.JavaSparkContext
 import org.apache.spark.rdd.RDD
-import org.apache.spark.sql.hudi.HoodieSqlUtils
-import org.apache.spark.sql.internal.SQLConf
-import 
org.apache.spark.sql.internal.StaticSQLConf.SCHEMA_STRING_LENGTH_THRESHOLD
 import org.apache.spark.sql.types.StructType
 import org.apache.spark.sql.{DataFrame, SQLContext, SaveMode, SparkSession}
 
 import scala.collection.JavaConversions._
 import scala.collection.mutable.ListBuffer
+import org.apache.hudi.keygen.factory.HoodieSparkKeyGeneratorFactory
+import org.apache.spark.sql.internal.{SQLConf, StaticSQLConf}

Review comment:
       wrong position

##########
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -26,6 +26,7 @@ import 
org.apache.hudi.common.model.HoodieTableType.{COPY_ON_WRITE, MERGE_ON_REA
 import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
 import org.apache.hudi.exception.HoodieException
 import org.apache.hudi.hadoop.HoodieROTablePathFilter
+import org.apache.hudi.hive.util.ConfigUtils

Review comment:
       split it via an empty line

##########
File path: packaging/hudi-flink-bundle/pom.xml
##########
@@ -141,6 +141,13 @@
 
                   <include>org.apache.hbase:hbase-common</include>
                   <include>commons-codec:commons-codec</include>
+                  
<include>org.apache.spark:spark-sql_${scala.binary.version}</include>

Review comment:
       @danny0405 Any thoughts that we can use to optimize? IMO, it seems to be 
not very graceful.

##########
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##########
@@ -160,6 +168,8 @@ public String toString() {
       + ", supportTimestamp=" + supportTimestamp
       + ", decodePartition=" + decodePartition
       + ", createManagedTable=" + createManagedTable
+      + ", saveAsSparkDataSourceTable=" + syncAsSparkDataSourceTable

Review comment:
       `save` to `sync` ?

##########
File path: 
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestHiveSyncTool.java
##########
@@ -157,17 +160,15 @@ public void testBasicSync(boolean useJdbc, boolean 
useSchemaFromCommitMetadata)
   }
 
   @ParameterizedTest
-  @MethodSource({"useJdbcAndSchemaFromCommitMetadata"})
+  @MethodSource({"useJdbcAndSchemaFromCommitMetadataAndSaveAsDataSource"})
   public void testSyncCOWTableWithProperties(boolean useJdbc,
-                                             boolean 
useSchemaFromCommitMetadata) throws Exception {
+                                             boolean 
useSchemaFromCommitMetadata,
+                                             boolean saveAsDataSourceTable) 
throws Exception {

Review comment:
       `save` -> `sync`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Support Read Hoodie As DataSource Table For Flink And DeltaStreamer
> -------------------------------------------------------------------
>
>                 Key: HUDI-2045
>                 URL: https://issues.apache.org/jira/browse/HUDI-2045
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Hive Integration
>            Reporter: pengzhiwei
>            Assignee: pengzhiwei
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.9.0
>
>
> Currently we only support reading hoodie table as datasource table for spark 
> since [https://github.com/apache/hudi/pull/2283]
> In order to support this feature for flink and DeltaStreamer, we need to sync 
> the spark table properties needed by datasource table to the meta store in 
> HiveSyncTool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2045) Support Read Hoodie As DataSource Table For Flink And DeltaStreamer

Reply via email to