[
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777408#comment-13777408
]
Jacob Perkins commented on PIG-3453:
------------------------------------
[~boneill], I haven't thought too hard about distinct yet myself. Since I'm
really only thinking about Trident and not storm in general, doing a distinct
strictly within a batch is one straightforward option. Unfortunately, from a
user standpoint, I think this would be (a) minimally useful and (b) confusing.
Instead we could implement something like an approximate distinct using an LRU
cache? Maybe even go so far as to implement a SQF (which I haven't read in its
entirety yet): http://www.vldb.org/pvldb/vol6/p589-dutta.pdf?
Also, what about order by? In what sense is an unbounded stream ordered?
I absolutely do not want to tie the storm/trident execution engine to an
external data store such as cassandra. Pig is supposed to be backend agnostic.
Maybe the -default- tap and sink can be Kafka (tap) and Cassandra (sink).
Finally, it should be possible to run a pig script in storm local mode.
And [~pradeepg26] I'm actually well on the way to having nested foreach
working. They way I'm working it now is each LogicalExpressionPlan becomes its
own Trident BaseFunction. Actually works quite nicely for now. I haven't gotten
to aggregates yet. What I probably won't implement for the POC is the tap and
sink.
> Implement a Storm backend to Pig
> --------------------------------
>
> Key: PIG-3453
> URL: https://issues.apache.org/jira/browse/PIG-3453
> Project: Pig
> Issue Type: New Feature
> Reporter: Pradeep Gollakota
> Labels: storm
>
> There is a lot of interest around implementing a Storm backend to Pig for
> streaming processing. The proposal and initial discussions can be found at
> https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira