[
https://issues.apache.org/jira/browse/HUDI-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378552#comment-17378552
]
ASF GitHub Bot commented on HUDI-2161:
--------------------------------------
nsivabalan commented on a change in pull request #3247:
URL: https://github.com/apache/hudi/pull/3247#discussion_r667395541
##########
File path:
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java
##########
@@ -81,7 +99,55 @@ public String getPartitionPath(Row row) {
return getKey(genericRecord).getPartitionPath();
}
- void buildFieldPositionMapIfNeeded(StructType structType) {
+ /**
+ * Fetch partition path from {@link InternalRow}.
+ *
+ * @param internalRow {@link InternalRow} instance from which partition path
needs to be fetched from.
+ * @param structType schema of the internalRow.
+ * @return the partition path.
+ */
+ public String getPartitionPath(InternalRow internalRow, StructType
structType) {
+ Row row = null;
+ try {
+ row = deserializeRow(getEncoder(structType), internalRow);
+ } catch (Exception e) {
+ throw new IllegalStateException("Convertion of InternalRow to Row failed
with exception " + e);
+ }
+ return getPartitionPath(row);
+ }
+
+ private ExpressionEncoder getEncoder(StructType structType) {
+ if (encoder == null) {
+ synchronized (this) {
+ encoder = getRowEncoder(structType);
+ }
+ }
+ return encoder;
+ }
+
+ private static ExpressionEncoder getRowEncoder(StructType schema) {
+ List<Attribute> attributes =
JavaConversions.asJavaCollection(schema.toAttributes()).stream()
+ .map(Attribute::toAttribute).collect(Collectors.toList());
+ return RowEncoder.apply(schema)
+
.resolveAndBind(JavaConverters.asScalaBufferConverter(attributes).asScala().toSeq(),
+ SimpleAnalyzer$.MODULE$);
+ }
+
+ private static Row deserializeRow(ExpressionEncoder encoder, InternalRow row)
Review comment:
yet to test this method. I found a similar method for serializeFromRow
and came up with this. But have fixed all other build in key gens like simple,
complex, timestamp, custom. Will update the patch once I have the fix. Wanted
to open up for reviews as I work on them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Add support to disable meta column to BulkInsert Row Writer path
> ----------------------------------------------------------------
>
> Key: HUDI-2161
> URL: https://issues.apache.org/jira/browse/HUDI-2161
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: sivabalan narayanan
> Priority: Major
> Labels: pull-request-available
>
> Objective here is to disable all meta columns so as to avoid storage cost.
> Also, some benefits could be seen in write latency with row writer path as no
> special handling is required at RowCreateHandle layer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)