Flink is not really well suited for interactive / adhoc processing. What could work is to use some local tool to identify the transformation rules and apply them with Flink to a large data set. But that's probably not what you are looking for, right?
Best, Fabian 2017-06-15 3:11 GMT+02:00 qi cui <cuiqi1...@gmail.com>: > Hi Andrew, > That will be great if you can come up with something to show the idea. > There are lots of wiki pages on the github you can refer to(including the > server side architecture and client side architecture). The unique feature > of the OpenRefine is its ability to have the user to interact with the > system and do the point-and-click to wangle the data set. But this is also > I think hardest part to migrate/refactoring to another system like flink > and spark. Or you can say OpenRefine deals with finite data set but > Flink/Spark deal with infinite data set with high velocity and variation. > They have some intersection in between of course. While we adopt a new > framework like Flink, we may have to consider what to give up and what > should be kept. > > Jacky > > On Wed, Jun 14, 2017 at 2:25 PM, Andrew Psaltis <psaltis.and...@gmail.com> > wrote: > > > Thad, > > Based on your description that OpenRefine uses similar techniques as > > Zeeplin then I *think* the reading and writing will work. > > > > The Undo/Redo I am fuzzy on as. > > > > I will try over the next couple of days and see if I can make something > > like this work (at lest a trivial use case). Personally I think it would > be > > cool to allow business users to wrangle data with OpenRefine with the > power > > of Flink behind it. > > > > > > > > On Wed, Jun 14, 2017 at 7:53 PM, Thad Guidry <thadgui...@gmail.com> > wrote: > > > >> Andrew, > >> > >> So you idea is that Flink could be used as a storage abstraction layer > for > >> OpenRefine ? Where OpenRefine would use TableSources for reading and > >> TableSinks for writing ? > >> And would that still work with our concept of Undo/Redo in OpenRefine to > >> use Flink's Savepoints in concert with TableSources and TableSinks ? > That > >> last part is where I am reading Flink docs now and still seeing a lot of > >> fuzzyness, which worries me. > >> > >> -Thad > >> +ThadGuidry <https://www.google.com/+ThadGuidry> > >> > >> > > >> > > >> > > > > > > > > -- > > Thanks, > > Andrew > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis> > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata> > > >