Hi all,

I agree that providing options to selectively store graph components can be
useful.
That said, if S2Graph decides to do this, I feel like the data model should
be better documented so that a general user can fully understand what each
option means.

Thanks,
Jo

On Wed, Apr 20, 2016 at 9:23 AM Jun Ki Kim <[email protected]> wrote:

> Sounds good idea!
>
> I especially love optional kafka publishing feature.
> Kafka is a good distributed massive data queue. That's why people usually
> send their data to the Kafka.
> On the other hands, It is too much data in a specific topic to handle or
> select data. I met the situation to just select and process one label from
> a one topic. I had to spend much my resources to filter out edges not my
> own.
> I convince the your "optional" feature will be helpful to S2Graph.
>
> Thanks for your suggestion!
>
> 2016년 4월 20일 (수) 오전 9:14, DO YUNG YOON <[email protected]>님이 작성:
>
> > Here is problem I encountered.
> >
> > I create label 'user_url_click' which store click log specifying who
> click
> > which url.
> > In many cases, clicked url is very skewed and the # of edges for very
> > popular url becomes very large, which yield memstore flush too often.
> >
> > Actually there is no need to store reversed direction(which store which
> url
> > is clicked by who) in my case since there is no query traversing from url
> > with direction 'in', but there is no way to skip this to avoid too often
> > memstore flush.
> >
> > So I think it would be better to provide extra options on label so user
> can
> > avoid these problem if they know what they are doing.
> >
> > Here is list of extra options I think might be helpful regarding storing
> > edge.
> >
> > 1. skipReverse: skip storing atomatic reverse direction edge.
> > 2. skipStoreVertex: skip storing vertex when storing edge.
> > 3. skipStoreSnapshotEdge: skip storing snapshotEdge when consistencyLevel
> > is weak.
> >
> > Also I think it would be good if we can provide options to control how
> edge
> > is published into kafka.
> > There is only one flag `isAsync` on label, which control which kafka
> topic
> > edges with specific label should be published into.
> > I think providing option to skip or sampling on publishing into kafka
> also
> > can be helpful.
> >
> > Wondering what other folks think
> >
> > Best Regards.
> > DOYUNG YOON
> >
>

Reply via email to