Hi TianYi,

It looks like the DeprecatedParquetInputFormat also accepts mapred
FileSplits [1], so if you created a simple wrapper that delegated to
FileInputFormat for split planning and DeprecatedParquetInputFormat for
everything else, you could use CombineFileInputFormat with that. We kept
the wrapper when we moved to planning with file splits so that anyone
casting the splits they get back to the wrapper class or ParquetInputSplit
wouldn't break.

rb


[1]:
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L95

On Thu, Feb 25, 2016 at 7:57 PM, Zhu, Tianyi <[email protected]> wrote:

> Hi all,
>
> I'm trying to read lots of small parquet files using scalding(cascading),
> and I was wondering if parquet is compatible with CombineFileInputFormat
> then I can read more files in one mapper. However, cascading is still using
> flowing class:
>
>
> https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/mapred/DeprecatedParquetInputFormat.java#L176
>
> This class is implementing InputSplit interface(incompatible with
> CombineFileInputFormat) rather than extending FileSplit class(compatible
> with CombineFileInputFormat).
>
> I'm wondering is that possible to remove this private class and use
> ParquetInputSplit instead?
>
> Thanks.
>
> --
> Ciao,
> TianYi ZHU
>
> ************** IMPORTANT MESSAGE *****************************
> This e-mail message is intended only for the addressee(s) and contains
> information which may be
> confidential.
> If you are not the intended recipient please advise the sender by return
> email, do not use or
> disclose the contents, and delete the message and any attachments from
> your system. Unless
> specifically indicated, this email does not constitute formal advice or
> commitment by the sender
> or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its
> subsidiaries.
> We can be contacted through our web site: commbank.com.au.
> If you no longer wish to receive commercial electronic messages from us,
> please reply to this
> e-mail by typing Unsubscribe in the subject line.
> **************************************************************
>
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to