[jira] [Updated] (S2GRAPH-50) Provide new HBase Storage Schema

DOYUNG YOON (JIRA) Sun, 28 Feb 2016 21:56:13 -0800

     [ 
https://issues.apache.org/jira/browse/S2GRAPH-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


DOYUNG YOON updated S2GRAPH-50:
-------------------------------
    Description: 
I think we need to provide choice for both for `Tall` and `Wide` row for 
IndexEdge. The fatal difference between these two would be following.

# Wide. if we store adjacent edges on single row with wide column and use get 
request to get adjacent edges. This is how IndexEdge is currently stored.

# Tall. adjacent edges are on multiple `consecutive` rows and we use scanner to 
scan through them. 

once S2GRAPH-17 is resolved, then I think only thing we have to do is provide 
`IndexEdgeSerializer` and `IndexEdgeDeserializer` for Tall row schema on HBase 
and I think this is very trivial task since we all have primitives for this. 

The hard part would be changing interface for client.

currently query support `offset` and `limit` for pagination. if we use scanner, 
then there is no easy way to support `offset`. 

I think it is worth to try with Tall row schema and benchmark them over Wide 
row schema. also I think this is very beneficial for others who is interested 
in implementing other storage such as RocksDB or LevelDB(including myself). 

I will followup with benchmark on both `Tall` and `Wide` row then we can decide 
what schema should be default. What others think? 

  was:
I think we need to provide choice for both for `Tall` and `Wide` row for 
IndexEdge. The fatal difference between these two would be following.

# Wide.

if we store adjacent edges on single row with wide column and use get request 
to get adjacent edges. This is how IndexEdge is currently stored.

# Tall.

adjacent edges are on multiple `consecutive` rows and we use scanner to scan 
through them. 

once S2GRAPH-17 is resolved, then I think only thing we have to do is provide 
`IndexEdgeSerializer` and `IndexEdgeDeserializer` for Tall row schema on HBase 
and I think this is very trivial task since we all have primitives for this. 

The hard part would be changing interface for client.

currently query support `offset` and `limit` for pagination. if we use scanner, 
then there is no easy way to support `offset`. 

I think it is worth to try with Tall row schema and benchmark them over Wide 
row schema. also I think this is very beneficial for others who is interested 
in implementing other storage such as RocksDB or LevelDB(including myself). 

I will followup with benchmark on both `Tall` and `Wide` row then we can decide 
what schema should be default. What others think? 


> Provide new HBase Storage Schema
> --------------------------------
>
>                 Key: S2GRAPH-50
>                 URL: https://issues.apache.org/jira/browse/S2GRAPH-50
>             Project: S2Graph
>          Issue Type: New Feature
>            Reporter: DOYUNG YOON
>            Assignee: DOYUNG YOON
>              Labels: benchmark, schema, serde
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I think we need to provide choice for both for `Tall` and `Wide` row for 
> IndexEdge. The fatal difference between these two would be following.
> # Wide. if we store adjacent edges on single row with wide column and use get 
> request to get adjacent edges. This is how IndexEdge is currently stored.
> # Tall. adjacent edges are on multiple `consecutive` rows and we use scanner 
> to scan through them. 
> once S2GRAPH-17 is resolved, then I think only thing we have to do is provide 
> `IndexEdgeSerializer` and `IndexEdgeDeserializer` for Tall row schema on 
> HBase and I think this is very trivial task since we all have primitives for 
> this. 
> The hard part would be changing interface for client.
> currently query support `offset` and `limit` for pagination. if we use 
> scanner, then there is no easy way to support `offset`. 
> I think it is worth to try with Tall row schema and benchmark them over Wide 
> row schema. also I think this is very beneficial for others who is interested 
> in implementing other storage such as RocksDB or LevelDB(including myself). 
> I will followup with benchmark on both `Tall` and `Wide` row then we can 
> decide what schema should be default. What others think? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (S2GRAPH-50) Provide new HBase Storage Schema

Reply via email to