Hi all,

I'm trying to read lots of small parquet files using scalding(cascading), and I 
was wondering if parquet is compatible with CombineFileInputFormat then I can 
read more files in one mapper. However, cascading is still using flowing class:

https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L176

This class is implementing InputSplit interface(incompatible with 
CombineFileInputFormat) rather than extending FileSplit class(compatible with 
CombineFileInputFormat).

I'm wondering is that possible to remove this private class and use 
ParquetInputSplit instead?

Thanks.

--
Ciao,
TianYi ZHU

************** IMPORTANT MESSAGE *****************************       
This e-mail message is intended only for the addressee(s) and contains 
information which may be
confidential. 
If you are not the intended recipient please advise the sender by return email, 
do not use or
disclose the contents, and delete the message and any attachments from your 
system. Unless
specifically indicated, this email does not constitute formal advice or 
commitment by the sender
or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. 
We can be contacted through our web site: commbank.com.au. 
If you no longer wish to receive commercial electronic messages from us, please 
reply to this
e-mail by typing Unsubscribe in the subject line. 
**************************************************************


Reply via email to