Hi TianYi, It looks like the DeprecatedParquetInputFormat also accepts mapred FileSplits [1], so if you created a simple wrapper that delegated to FileInputFormat for split planning and DeprecatedParquetInputFormat for everything else, you could use CombineFileInputFormat with that. We kept the wrapper when we moved to planning with file splits so that anyone casting the splits they get back to the wrapper class or ParquetInputSplit wouldn't break.
rb [1]: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L95 On Thu, Feb 25, 2016 at 7:57 PM, Zhu, Tianyi <[email protected]> wrote: > Hi all, > > I'm trying to read lots of small parquet files using scalding(cascading), > and I was wondering if parquet is compatible with CombineFileInputFormat > then I can read more files in one mapper. However, cascading is still using > flowing class: > > > https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L176 > > This class is implementing InputSplit interface(incompatible with > CombineFileInputFormat) rather than extending FileSplit class(compatible > with CombineFileInputFormat). > > I'm wondering is that possible to remove this private class and use > ParquetInputSplit instead? > > Thanks. > > -- > Ciao, > TianYi ZHU > > ************** IMPORTANT MESSAGE ***************************** > This e-mail message is intended only for the addressee(s) and contains > information which may be > confidential. > If you are not the intended recipient please advise the sender by return > email, do not use or > disclose the contents, and delete the message and any attachments from > your system. Unless > specifically indicated, this email does not constitute formal advice or > commitment by the sender > or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its > subsidiaries. > We can be contacted through our web site: commbank.com.au. > If you no longer wish to receive commercial electronic messages from us, > please reply to this > e-mail by typing Unsubscribe in the subject line. > ************************************************************** > > > -- Ryan Blue Software Engineer Netflix
