Hi Pratyaksh, The partitioning format is pluggable in Hudi. 1. For Hudi Writing, you can simply use one of the several implementations of org.apache.hudi.KeyGenerator or write your own implementation to control partition path format. You can configure partition-path using https://hudi.incubator.apache.org/configurations.html#KEYGENERATOR_CLASS_OPT_KEY 2. For Hive Syncing, there are again some default implementations for org.apache.hudi.hive.PartitionValueExtractor. You can also write your custom partition value extractor and configure using https://hudi.incubator.apache.org/configurations.html#HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY
Thanks,Balaji.V On Tuesday, August 13, 2019, 03:23:57 AM PDT, Pratyaksh Sharma <[email protected]> wrote: Hi, I have been working on Hudi for sometime and have an improvement suggestion. When we build a CDC pipeline, generally the field used for partitioning is date (created_at), and the general format of created_at is yyyy-MM-dd HH:mm:ss.S. If we have this field formatted to yyyy/MM/dd, then your hive queries for fetching data between any two dates become much complex, which is the usual case. For example, 1. If the partitions are in format yyyy/MM/dd, then query to select data for all days between 2015-01-01 and 2015-03-01 would look like, SELECT * FROM db.table where year=2015 and ((month=01 or month=02) or (month=03 and day=01)) 2. Instead if partitions are in the format yyyy-MM-dd or yyyymmdd it supports direct queries on the data. e.g the above mentioned query would look like, SELECT * from db.table where DateStamp between ‘2015-01-01’ and ‘2015-03-01’. Reference - https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html <https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html> The proposal is to make the default partitioning to yyyy-mm-dd OR at least provide a provision to change the format. Please suggest on the above. Please find the jira raised here <https://issues.apache.org/jira/browse/HUDI-206> (HUDI-206). Regards, Pratyaksh
