[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

ASF GitHub Bot (Jira) Fri, 16 Jul 2021 09:27:05 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17382189#comment-17382189
 ]


ASF GitHub Bot commented on HUDI-2161:
--------------------------------------

nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r671381130



##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##########
@@ -81,7 +105,52 @@ public String getPartitionPath(Row row) {
     return getKey(genericRecord).getPartitionPath();
   }
 
-  void buildFieldPositionMapIfNeeded(StructType structType) {
+  /**
+   * Fetch partition path from {@link InternalRow}.
+   *
+   * @param internalRow {@link InternalRow} instance from which partition path 
needs to be fetched from.
+   * @param structType  schema of the internalRow.
+   * @return the partition path.
+   */
+  public String getPartitionPath(InternalRow internalRow, StructType 
structType) {
+    try {
+      Row row = deserializeRow(getEncoder(structType), internalRow);
+      return getPartitionPath(row);
+    } catch (Exception e) {
+      throw new HoodieIOException("Conversion of InternalRow to Row failed 
with exception " + e);
+    }
+  }
+
+  private ExpressionEncoder getEncoder(StructType structType) {
+    if (encoder == null) {
+      encoder = getRowEncoder(structType);
+    }
+    return encoder;
+  }
+
+  private static ExpressionEncoder getRowEncoder(StructType schema) {
+    List<Attribute> attributes = 
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+        .map(Attribute::toAttribute).collect(Collectors.toList());
+    return RowEncoder.apply(schema)
+        
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+            SimpleAnalyzer$.MODULE$);
+  }
+
+  private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
+      throws InvocationTargetException, IllegalAccessException, 
NoSuchMethodException, ClassNotFoundException {
+    // TODO remove reflection if Spark 2.x support is dropped
+    if (package$.MODULE$.SPARK_VERSION().startsWith("2.")) {

Review comment:
       I could not find any. 
   ```
   grep -irl "import org.apache.spark.sql.catalyst.expressions.Attribute" 
hudi-*/
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/SparkDatasetTestUtils.java
   
hudi-client//hudi-spark-client/src/test/java/org/apache/hudi/testutils/KeyGeneratorTestUtilities.java
   
hudi-client//hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
   
hudi-integ-test//src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/ValidateDatasetNode.java
   
hudi-spark-datasource//hudi-spark/src/test/java/org/apache/hudi/TestHoodieDatasetBulkInsertHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/java/org/apache/hudi/SparkRowWriteHelper.java
   
hudi-spark-datasource//hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodieAnalysis.scala
   ```
   In most of these places, we do have a static method to getEncode, but the 
deserializeRow is first of its kind. We do have serializeRow which converts Row 
to InternalRow in KeyGeneratorTestUtilities, but its converts from Row -> 
InternalRow and its part of test code. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Add support to disable meta column to BulkInsert Row Writer path
> ----------------------------------------------------------------
>
>                 Key: HUDI-2161
>                 URL: https://issues.apache.org/jira/browse/HUDI-2161
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: sivabalan narayanan
>            Priority: Major
>              Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost. 
> Also, some benefits could be seen in write latency with row writer path as no 
> special handling is required at RowCreateHandle layer. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-2161) Add support to disable meta column to BulkInsert Row Writer path

Reply via email to