Hi all, I'm trying to read lots of small parquet files using scalding(cascading), and I was wondering if parquet is compatible with CombineFileInputFormat then I can read more files in one mapper. However, cascading is still using flowing class:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L176 This class is implementing InputSplit interface(incompatible with CombineFileInputFormat) rather than extending FileSplit class(compatible with CombineFileInputFormat). I'm wondering is that possible to remove this private class and use ParquetInputSplit instead? Thanks. -- Ciao, TianYi ZHU ************** IMPORTANT MESSAGE ***************************** This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please advise the sender by return email, do not use or disclose the contents, and delete the message and any attachments from your system. Unless specifically indicated, this email does not constitute formal advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. We can be contacted through our web site: commbank.com.au. If you no longer wish to receive commercial electronic messages from us, please reply to this e-mail by typing Unsubscribe in the subject line. **************************************************************
