Thank you for writing down the ideas. I think we should not open JIRAs for these ideas. I would rather prefer to put the list on the website or a wiki (once we have that).
On Fri, Jun 20, 2014 at 6:25 PM, Kostas Tzoumas <[email protected] > wrote: > Hi Folks, > > After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few > project ideas that people have been throwing around. These do not > immediately classify as issues as they are major extensions of Flink (some > might classify as completely different projects). These would make nice > standalone implementation projects, for example for University theses. Some > of them also require research and architecture work. > > The relevance to this mailing list is that perhaps someone is interested in > picking up such a project. > > Here is the idea dump: > > --------------- > > Domain-specific language for graph processing: Create a GraphDataSet that > abstracts away the internal representation of a graph and operations on the > GraphDataSet. The project involves gathering requirements for graph > processing functionality, architecting the DSL, implementation, and > possible work on optimizing the operations when a graph operation can be > mapped to different DataSet to DataSet transformations. > > Distributed mutable state: Currently delta iterations use internally a hash > index to store the state of the iteration, and they invoke index merging > functionality. One idea would be to surface an operator (with care) to the > APIs that essentially allows mutable state manipulations. Another idea > would be to implement something along the lines of a parameter server and > make such functionality accessible to the APIs. > > Domain-specific language for spatial data: Create spatial data types > (point, region, etc) and operations thereof > > Integration into Apache BigTop > > Integration with Apache Ambari > > Pig frontend for Flink: An initial effort was here: > http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf > > Cascading on Flink > > Optimizing the integration with columnar file formats (Parquet, ORCFile) > and perhaps eventually pushing filters down to data scans. > > Statistical operators to extract statistical information from a DataSet > (e.g., histograms of value distributions) > > Integration with Apache Mahout (ongoing effort) > > Integration with Apache Tez (ongoing effort) > > Flink Streaming (ongoing effort) > > Eclipse plugin that includes functionality for execution plan debugging > > Local execution of programs using Java Collections > > --------------- > > Feel free to extend the descriptions that are empty and to extend this > list. > > Do you think that these would qualify as JIRA tickets classified as > "wishes"? > > Kostas >
