Thanks for the explanation Kostas. I am hoping to keep the Flink APIs (i.e. the operator functions) clean and hide all Tez nitty gritty in the plan execution =)
- Henry On Tue, Jun 24, 2014 at 5:05 AM, Kostas Tzoumas <[email protected]> wrote: > Henry, > > I am currently travelling and be able to write more about this next week. > The idea is to use Tez as the distributed engine, and port Flink's runtime > operators (for joins, aggregation) etc on top of that. The Flink APIs and > optimizer should not need many changes. This should be in theory possible > for the non-iterative parts of Flink. Filip has started an early effort of > getting a WordCount that uses Stratosphere types and operators to run on > top of Tez: > https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto1/tez-mapreduce-examples/src/main/java/org/apache/tez/stratosphere > > Kostas > > > On Tue, Jun 24, 2014 at 12:33 AM, Henry Saputra <[email protected]> > wrote: > >> I am interested to see how Flink integrate with Apache Tez. Anyone has >> any reference or JIRA or any doc to see how far the ongoing effort >> been going? >> >> >> Thanks, >> >> - Henry >> >> On Fri, Jun 20, 2014 at 9:25 AM, Kostas Tzoumas >> <[email protected]> wrote: >> > Hi Folks, >> > >> > After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few >> > project ideas that people have been throwing around. These do not >> > immediately classify as issues as they are major extensions of Flink >> (some >> > might classify as completely different projects). These would make nice >> > standalone implementation projects, for example for University theses. >> Some >> > of them also require research and architecture work. >> > >> > The relevance to this mailing list is that perhaps someone is interested >> in >> > picking up such a project. >> > >> > Here is the idea dump: >> > >> > --------------- >> > >> > Domain-specific language for graph processing: Create a GraphDataSet that >> > abstracts away the internal representation of a graph and operations on >> the >> > GraphDataSet. The project involves gathering requirements for graph >> > processing functionality, architecting the DSL, implementation, and >> > possible work on optimizing the operations when a graph operation can be >> > mapped to different DataSet to DataSet transformations. >> > >> > Distributed mutable state: Currently delta iterations use internally a >> hash >> > index to store the state of the iteration, and they invoke index merging >> > functionality. One idea would be to surface an operator (with care) to >> the >> > APIs that essentially allows mutable state manipulations. Another idea >> > would be to implement something along the lines of a parameter server and >> > make such functionality accessible to the APIs. >> > >> > Domain-specific language for spatial data: Create spatial data types >> > (point, region, etc) and operations thereof >> > >> > Integration into Apache BigTop >> > >> > Integration with Apache Ambari >> > >> > Pig frontend for Flink: An initial effort was here: >> > http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf >> > >> > Cascading on Flink >> > >> > Optimizing the integration with columnar file formats (Parquet, ORCFile) >> > and perhaps eventually pushing filters down to data scans. >> > >> > Statistical operators to extract statistical information from a DataSet >> > (e.g., histograms of value distributions) >> > >> > Integration with Apache Mahout (ongoing effort) >> > >> > Integration with Apache Tez (ongoing effort) >> > >> > Flink Streaming (ongoing effort) >> > >> > Eclipse plugin that includes functionality for execution plan debugging >> > >> > Local execution of programs using Java Collections >> > >> > --------------- >> > >> > Feel free to extend the descriptions that are empty and to extend this >> list. >> > >> > Do you think that these would qualify as JIRA tickets classified as >> > "wishes"? >> > >> > Kostas >>
