Hi all, I agree that providing options to selectively store graph components can be useful. That said, if S2Graph decides to do this, I feel like the data model should be better documented so that a general user can fully understand what each option means.
Thanks, Jo On Wed, Apr 20, 2016 at 9:23 AM Jun Ki Kim <[email protected]> wrote: > Sounds good idea! > > I especially love optional kafka publishing feature. > Kafka is a good distributed massive data queue. That's why people usually > send their data to the Kafka. > On the other hands, It is too much data in a specific topic to handle or > select data. I met the situation to just select and process one label from > a one topic. I had to spend much my resources to filter out edges not my > own. > I convince the your "optional" feature will be helpful to S2Graph. > > Thanks for your suggestion! > > 2016년 4월 20일 (수) 오전 9:14, DO YUNG YOON <[email protected]>님이 작성: > > > Here is problem I encountered. > > > > I create label 'user_url_click' which store click log specifying who > click > > which url. > > In many cases, clicked url is very skewed and the # of edges for very > > popular url becomes very large, which yield memstore flush too often. > > > > Actually there is no need to store reversed direction(which store which > url > > is clicked by who) in my case since there is no query traversing from url > > with direction 'in', but there is no way to skip this to avoid too often > > memstore flush. > > > > So I think it would be better to provide extra options on label so user > can > > avoid these problem if they know what they are doing. > > > > Here is list of extra options I think might be helpful regarding storing > > edge. > > > > 1. skipReverse: skip storing atomatic reverse direction edge. > > 2. skipStoreVertex: skip storing vertex when storing edge. > > 3. skipStoreSnapshotEdge: skip storing snapshotEdge when consistencyLevel > > is weak. > > > > Also I think it would be good if we can provide options to control how > edge > > is published into kafka. > > There is only one flag `isAsync` on label, which control which kafka > topic > > edges with specific label should be published into. > > I think providing option to skip or sampling on publishing into kafka > also > > can be helpful. > > > > Wondering what other folks think > > > > Best Regards. > > DOYUNG YOON > > >
