Hi

Recently attempted to use prestodb and liked it and came across the hudi 
project.
After going through a few pages and the source code it seems that hudi has its 
own inputFormat.

I am attempting to do an early predicate pushdown on my own side and the 
question is probably not related to hudi .. but just wanted to get an idea if 
someone with experience using both prestodb and hive could enlighten me on.

Imagine I create a hive table using my own customInputFormat. I see that Hudi 
has contributed an annotation which allows prestodb to invoke the splits from 
the customInputFormat.

for simplicity the hive table consists of two columns someid, anotherid

Imagine files in hdfs are laid out as /some/folder/someid.anotherid.someformat

and a query such as select * from hive_table where anotherid = abc.

what i want to attempt to do is to capture the above query so that when the 
prestodb queries hivemetadata for the table and returns my customInputFormat 
then i could potentially in the getSplit method use a glob expression to filter 
out and grab only those files which satisfy the condition anotherid=abc before 
the handoff to the query execution in presto.

any pointers would be useful.

Thanks, 

Reply via email to