Sounds good idea!

I especially love optional kafka publishing feature.
Kafka is a good distributed massive data queue. That's why people usually
send their data to the Kafka.
On the other hands, It is too much data in a specific topic to handle or
select data. I met the situation to just select and process one label from
a one topic. I had to spend much my resources to filter out edges not my
own.
I convince the your "optional" feature will be helpful to S2Graph.

Thanks for your suggestion!

2016년 4월 20일 (수) 오전 9:14, DO YUNG YOON <[email protected]>님이 작성:

> Here is problem I encountered.
>
> I create label 'user_url_click' which store click log specifying who click
> which url.
> In many cases, clicked url is very skewed and the # of edges for very
> popular url becomes very large, which yield memstore flush too often.
>
> Actually there is no need to store reversed direction(which store which url
> is clicked by who) in my case since there is no query traversing from url
> with direction 'in', but there is no way to skip this to avoid too often
> memstore flush.
>
> So I think it would be better to provide extra options on label so user can
> avoid these problem if they know what they are doing.
>
> Here is list of extra options I think might be helpful regarding storing
> edge.
>
> 1. skipReverse: skip storing atomatic reverse direction edge.
> 2. skipStoreVertex: skip storing vertex when storing edge.
> 3. skipStoreSnapshotEdge: skip storing snapshotEdge when consistencyLevel
> is weak.
>
> Also I think it would be good if we can provide options to control how edge
> is published into kafka.
> There is only one flag `isAsync` on label, which control which kafka topic
> edges with specific label should be published into.
> I think providing option to skip or sampling on publishing into kafka also
> can be helpful.
>
> Wondering what other folks think
>
> Best Regards.
> DOYUNG YOON
>

Reply via email to