[
https://issues.apache.org/jira/browse/CRUNCH-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858348#comment-13858348
]
Chao Shi commented on CRUNCH-315:
---------------------------------
Thanks Josh. +1 for your patch (I'm not familiar with Spark, but your code
seems very straight forward).
I agree that we will have to serialize data onto disk, and thus the
implementation will be simply. So the only question is whether do we really
need it? If we are not quite sure, we can here only add "empty collection" and
wait for someone proposing a real use case for it.
> Empty collection
> ----------------
>
> Key: CRUNCH-315
> URL: https://issues.apache.org/jira/browse/CRUNCH-315
> Project: Crunch
> Issue Type: New Feature
> Reporter: Chao Shi
> Attachments: CRUNCH-315.patch
>
>
> As discussed in the mailing list [1] and [2], I'd like to add an empty
> collection feature. On the API side, I think we can add a new method in
> Pipeline to create an empty collection. The collection should be a subclass
> of PCollection and behaves like other normal PCollecitons. There are also
> some optimization points that Josh mentioned in [2].
> I haven't thought it clearly. Just put a ticket here and see if anyone else
> has a better idea.
> [1]
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201312.mbox/%3CBLU0-SMTP1337A04FAC6B5F497F7473EADC10%40phx.gbl%3E
> [2]
> http://mail-archives.apache.org/mod_mbox/crunch-dev/201312.mbox/%3CCAH29n6MbSK9gapoC2DgVnhofjAobyasCuZh_0475DuSajV%3DCPg%40mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)