bhat-vinay opened a new pull request, #10272:
URL: https://github.com/apache/hudi/pull/10272

   Issue:
   There are two configs which when set in a certain manner throws exceptions 
or asserts
   1. Configs to disable populating metadata fields (for each row)
   2. Configs to drop partition columns (to save storage space) from a row
   
   With #1 and #2, partition paths cannot be deduced using partition columns 
(as the partition columns are dropped higher up the stack. 
BulkInsertDataInternalWriterHelper::write(...) relied on metadata fields to 
extract partition path in such cases. But with #1 it is not possible resulting 
in asserts/exceptions.
   
   The fix is to push down the dropping of partition columns down the stack 
after partition path is computed. The fix manipulates the raw 'InternalRow' row 
structure by only copying the relevent fields into a new 'InternalRow' 
structure. Each row is processed individually to drop the partition columns and 
copy it a to new 'InternalRow'
   
   ### Change Logs
   
   Issue:
   There are two configs which when set in a certain manner throws exceptions 
or asserts
   1. Configs to disable populating metadata fields (for each row)
   2. Configs to drop partition columns (to save storage space) from a row
   
   With #1 and #2, partition paths cannot be deduced using partition columns 
(as the partition columns are dropped higher up the stack. 
BulkInsertDataInternalWriterHelper::write(...) relied on metadata fields to 
extract partition path in such cases. But with #1 it is not possible resulting 
in asserts/exceptions.
   
   The fix is to push down the dropping of partition columns down the stack 
after partition path is computed. The fix manipulates the raw 'InternalRow' row 
structure by only copying the relevent fields into a new 'InternalRow' 
structure. Each row is processed individually to drop the partition columns and 
copy it a to new 'InternalRow'
   
   ### Impact
   
   No piblic API or user facing changes. However, each InternalRow structure is 
processed and copied individually when
   the config is set.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to