Hi Hyunsung.

I know schema version is somewhat confusing so here is my first try to
elaborate differences.

1. version 1

we started from this simple, but not efficient hbase schema.

- IndexEdge
 rowKey: (hash(srcVertex.innerId), srcVertex.innerId, labelWithDirection,
labelIndex, isInverted(false))
 qualifier: (serialized indexProps, tgtVertex.innerId)
 value: (serialized extraProps(not included in indexProps))

- SnapshotEdge
 rowKey: (hash(srcVertex.innerId), srcVertex.innerId, labelWithDirection,
labelIndex, isInverted(true))
 qualifier: (tgtVertex.innerId)
 value: (serialized all props(indexProps + extraProps))

problem was that we use innerVal serialize/deserialize scheme which is
custom and very limited.

2. version 2

actually version1 and version2 is not different in terms of hbase table
schema. only difference between version 1 and version 2 is that which
innerVal version are used.

we use HBase-common's OrderedBytes to encode/decode bytes in order and this
is included when types.v2.InnerVal added.

3. version 3

there were few problems with hbase table schema especially on snapshotEdge.

note that snapshotEdge lookup on given (srcVertex.innerId, label,
tgtVertex.innerId) which means random access, comparing this into indexEdge
which read wide columns using Get.

note that if related snapshotEdges for given vertex reside on same row(same
region server), then this becomes hotspot which make HBase unhappy. not
only deleteAll but also consecutive update/delete on same vertex becomes
hotspot.

to overcome this problem, tall schema for snapshotEdge introduced on
version 3. version 3 include tgtVertex.innerId on rowKey so seperate
related snapshotEdges to multiple region. actually there is any reason to
keep version1, version2 for snapshotEdge, but it only exist for backward
compatability. I think we should remove version1, version2 in future after
discussion.

- IndexEdge: same with version1 and version2

- SnapshotEdge
 rowKey: (hash(srcVertex.innerId + "," + tgtVertex.innerId),
srcVertex.innerId, tgtVertex.innerId, labelWithDirection, labelIndex,
isInverted(true))
 qualifier: empty bytes
 value: (serialized all props(indexProps + extraProps))

note that tgtVertex.innerId included into rowKey.

4. version 4

version 3 expect columnar storage since it encode edges into wide-row using
qualifier.

There is few people who want to use s2graph with different storage engine
which does not support columnar, such as redis.
Also when # of edges from given vertex is small, then wide-row schema has
no problems, but once # of edges become large, HBase is not happy since
response of get becomes too large.

- IndexEdge:
 rowKey: (hash(srcVertex.innerId), srcVertex.innerId, labelWithDirection,
labelIndex, isInverted(false), serialized indexProps, tgtVertex.innerId)
 qualifier: empty bytes
 value: (serialized extraProps(not included in indexProps))

- SnapshotEdge: same with version3

note that we moved qualifier to move from wide-row schema to tall-row
schema. also don't forget to use Scanner for read instead of Get.


So in this period of time, it is strongly recommended to use version3 or
version4.
some storage engine such as Cassandra can benefit from wide-row schema, but
storage without columnar support can use tall-row schema.


Thanks for asking this questions and hope this help.

If there is any parts that is not clear, please do ask.

Best Regards.
DOYUNG YOON


On Mon, Apr 4, 2016 at 3:24 PM Hyunsung Jo <[email protected]> wrote:

> Hi all,
>
> What are the main differences of each version?
> I ran into this question while wondering which test cases v1 schema should
> support.
> For example, I understand that wide -> tall rows was the key change in v3
> -> v4 from S2GRAPH-50 <https://issues.apache.org/jira/browse/S2GRAPH-50>.
> Could someone elaborate on the other version changes?
>
> Thanks,
> Jo
>
> On Fri, Apr 1, 2016 at 5:01 PM Hyunsung Jo <[email protected]> wrote:
>
> > Hi all,
> >
> > Created  S2GRAPH-62 <https://issues.apache.org/jira/browse/S2GRAPH-62>
> regarding
> > "1. The tests do not cover all schema versions".
> > (Assuming that silence implies consensus and more testing is good! Haha!)
> >
> > Disregarding "2. No option for different storages" for now since the
> > community is yet to agree on multi-storage support.
> >
> > Thanks,
> > Jo
> >
> > On Wed, Mar 23, 2016 at 7:24 AM Hyunsung Jo <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> I'd like to address some issues with S2Graph test code.
> >>
> >> 1. The tests do not cover all schema versions:
> >> The latest version of S2Graph has four schema versions (v1 through v4).
> >> Yet, most of the current test cases only cover one or two versions
> (usually
> >> v2 or v3). Each test case should run against all versions that support
> the
> >> feature being tested.
> >>
> >> 2. No option for different storages:
> >> S2Graph has plans to support storages other than HBase (RocksDB, Redis,
> >> and so on). But, test cases such as 'AsynchbaseStorageTest ' aren't
> >> necessary for RocksDB. In this case, it would be more
> developer-friendly to
> >> provide an option to run only the tests that concern a given storage. In
> >> other words, if one is using S2Graph with RocksDB, she should be able to
> >> run the test cases that covers common or RocksDB-related features only,
> and
> >> skip the ones like 'AsynchbaseStorageTest'.
> >>
> >> 3. Mixed use of different testing styles:
> >> S2Graph uses ScalaTest which supports several testing styles (
> >> http://www.scalatest.org/user_guide/selecting_a_style). As is, multiple
> >> styles are used in the test code, FunSuite and FlatSpec. I think it's
> >> better to stick to one style.
> >>
> >>
> >> My suggestion regarding 1 and 2 is to use tags (
> >> http://www.scalatest.org/user_guide/tagging_your_tests).
> >> This way, you can configure the test suite to run selective test cases
> >> with a predefined SBT command.
> >> For example, let's say all the test cases are tagged according to their
> >> supported versions or storages, and Redis only supports schema version
> v4.
> >> In 'build.sbt', you can define that a predefined command such as
> >> 'redis:test' should only run the tests that are tagged either 'v4' or
> >> 'Redis'.
> >>
> >> Issue 3 is merely a matter of deciding witch style roll with and simply
> >> rewriting the code.
> >>
> >> Let me know what you think, and I'll open a JIRA ticket for this.
> >>
> >> Regards,
> >> Jo
> >>
> >>
> >>
>

Reply via email to