Sounds good idea! I especially love optional kafka publishing feature. Kafka is a good distributed massive data queue. That's why people usually send their data to the Kafka. On the other hands, It is too much data in a specific topic to handle or select data. I met the situation to just select and process one label from a one topic. I had to spend much my resources to filter out edges not my own. I convince the your "optional" feature will be helpful to S2Graph.
Thanks for your suggestion! 2016년 4월 20일 (수) 오전 9:14, DO YUNG YOON <[email protected]>님이 작성: > Here is problem I encountered. > > I create label 'user_url_click' which store click log specifying who click > which url. > In many cases, clicked url is very skewed and the # of edges for very > popular url becomes very large, which yield memstore flush too often. > > Actually there is no need to store reversed direction(which store which url > is clicked by who) in my case since there is no query traversing from url > with direction 'in', but there is no way to skip this to avoid too often > memstore flush. > > So I think it would be better to provide extra options on label so user can > avoid these problem if they know what they are doing. > > Here is list of extra options I think might be helpful regarding storing > edge. > > 1. skipReverse: skip storing atomatic reverse direction edge. > 2. skipStoreVertex: skip storing vertex when storing edge. > 3. skipStoreSnapshotEdge: skip storing snapshotEdge when consistencyLevel > is weak. > > Also I think it would be good if we can provide options to control how edge > is published into kafka. > There is only one flag `isAsync` on label, which control which kafka topic > edges with specific label should be published into. > I think providing option to skip or sampling on publishing into kafka also > can be helpful. > > Wondering what other folks think > > Best Regards. > DOYUNG YOON >
