hudi-bot opened a new issue, #15733:
URL: https://github.com/apache/hudi/issues/15733

   When using a Deltastreamer Transformer, the output of the Transformer cannot 
be used as values for partitioning. This is an issue if the user wants to use 
something like the SqlQueryBasedTransformer or a custom transformer to generate 
a partition field from another field in the incoming record.
   
   In a test, I used the following configs:
   {noformat}
   hoodie.deltastreamer.transformer.sql=SELECT a.*, from_unixtime(timestamp, 
'yyyy') as year, from_unixtime(timestamp, 'MM') as month, 
from_unixtime(timestamp, 'dd') as day, from_unixtime(timestamp, 'HH') as hour 
FROM <SRC> a
   
   
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator
   hoodie.datasource.write.partitionpath.field=year,month,day,hour
    {noformat}
   What I expect to happen is that the data files in the output DFS are 
formatted like this:
   {noformat}
   /path/to/dfs/table/<year>/<month>/<day>/<hour>/
   eg:
   s3://test-bucket/table/2023/01/30/15/{noformat}
   However instead I get the following structure:
   {noformat}
   
/path/to/dfs/table/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/__HIVE_DEFAULT_PARTITION__/{noformat}
   I would expect the output of Transformers to be available for partitioning 
just like any other column in the dataset.
   
    
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-5648
   - Type: Bug
   
   
   ---
   
   
   ## Comments
   
   30/Jan/23 02:47;cloventt;Related to this, is there some other way to 
configure DeltaStreamer to create the desired /<year>/<month>/<day>/<hour>/ 
partitioning scheme from an input EPOCHMILLISECONDS column on the data stream? 
I cannot use a date format of "yyyy/MM/dd/hh", because this would break when 
setting the hoodie.datasource.write.hive_style_partitioning configuration 
option. You end up with a a folder structure of:
   
    
   {noformat}
   /path/to/table/timestamp=2023/01/30/15{noformat}
   which is not correct.
   
    ;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to