Here is problem I encountered.

I create label 'user_url_click' which store click log specifying who click
which url.
In many cases, clicked url is very skewed and the # of edges for very
popular url becomes very large, which yield memstore flush too often.

Actually there is no need to store reversed direction(which store which url
is clicked by who) in my case since there is no query traversing from url
with direction 'in', but there is no way to skip this to avoid too often
memstore flush.

So I think it would be better to provide extra options on label so user can
avoid these problem if they know what they are doing.

Here is list of extra options I think might be helpful regarding storing
edge.

1. skipReverse: skip storing atomatic reverse direction edge.
2. skipStoreVertex: skip storing vertex when storing edge.
3. skipStoreSnapshotEdge: skip storing snapshotEdge when consistencyLevel
is weak.

Also I think it would be good if we can provide options to control how edge
is published into kafka.
There is only one flag `isAsync` on label, which control which kafka topic
edges with specific label should be published into.
I think providing option to skip or sampling on publishing into kafka also
can be helpful.

Wondering what other folks think

Best Regards.
DOYUNG YOON

Reply via email to