[
https://issues.apache.org/jira/browse/HUDI-3995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alexey Kudinkin updated HUDI-3995:
----------------------------------
Description:
*EDIT*
*====*
While investigating, perf hits in the Bulk Insert a few issues were found:
# NonPartitionedKeyGenerator does not implement `getRecordKey`,
`getParititionKey` for `InternalRow`, leading to invocation of default
implementation converting row to Avro.
# HUDI-3993: Using UDF to fetch record keys, similarly has to deserialize
`InternalRow` into deserialized `Row`
was:
*EDIT*
*-----*
While investigating, perf hits in the Bulk Insert a few issues were found:
# NonPartitionedKeyGenerator does not implement `getRecordKey`,
`getParititionKey` for `InternalRow`, leading to invocation of default
implementation converting row to Avro.
# HUDI-3993: Using UDF to fetch record keys, similarly has to deserialize
`InternalRow` into deserialized `Row`
> Bulk insert row writer perf improvements
> ----------------------------------------
>
> Key: HUDI-3995
> URL: https://issues.apache.org/jira/browse/HUDI-3995
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark, writer-core
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.12.0
>
> Original Estimate: 12h
> Remaining Estimate: 12h
>
> *EDIT*
> *====*
> While investigating, perf hits in the Bulk Insert a few issues were found:
> # NonPartitionedKeyGenerator does not implement `getRecordKey`,
> `getParititionKey` for `InternalRow`, leading to invocation of default
> implementation converting row to Avro.
> # HUDI-3993: Using UDF to fetch record keys, similarly has to deserialize
> `InternalRow` into deserialized `Row`
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)