Re: JSONiq data model

Till Westmann Mon, 09 May 2016 22:18:29 -0700


On 9 May 2016, at 12:02, Preston Carman wrote:

I think we have three options: optimize for space, keys (jdm:keys) or
field lookup (jdm:value). The optimization for keys and field lookup
could be done independently. Lets consider the option currently in the
wiki as option 1 (space). Don't remove this option from the wiki so we
have a reference. The new options for keys and field lookup can be
added as option 2 and 3.

Option 1 (space): A tightly compact format that is optimized to savespace.

Option 2 (keys): A data model optimized for accessing a list of keys.

Option 3 (lookup): A data model optimized for accessing a field in theobject.


For option 2 (keys):

Consider the return value for jdm:keys: jdm:keys($o as object()) asxs:string*

I am not sure I fully understand what xs:string* represents. Is this a
sequence of string as in XQuery or an array in JSONiq or some other
structure. The most optimal way to return the keys would be to store
them in the same way they should be returned. This way you can do a
simple copy to produce the result without processing the result. In
this case, storing them as a sequence (or array) of string values
might be the best option. The values would then need to be a separate
sequence (or array) of typed values in the object data model. Pro:
easy keys function. Con: added a list of offsets for the keys.


xs:string* is indeed a sequence of strings

For option 3 (lookup):
This option is independent of option 2. As Till suggested we can
implement this at a later date. We would need a method to improve the
lookup of a field. Option 1 and 2 requires a sequential search of the
keys and a string comparison at each field. The AsterixDB record data
model is a little more complex than I first thought. Take a look a
their record implementation: writing the record [1] (line 205 to 245
are interesting) and field look up [2] (line 277 to 344) . We only
need to consider the open part of the record. (The closed part can be
ignored.)


I had another idea for the implementation of the dictionary. We could

store the keys in sorted order - while we store the values in theoriginal

order. If each key is then followed by the offset to the value, we would
get

a) a log n access for a value (as the keys are sorted and we can dobinary

   search) and
b) the keys in their original order, if we sort them by the offsets.
Assuming that the value() access is quite a bit more common than the
keys() access this could be a reasonable trade-off.

Comments?


Sounds good to list the options on the Wiki page.

Also, what is the actual result of jdm:keys?


A sequence of strings.

What is the requirement for the initial implementation?


It should be correct and tested.

My 2c,
Till

[1]https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java[2]https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java
On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed
<[email protected]> wrote:
Hi,
Is there any documentation I could go through to understand theAsterixDBHash code implementation on the open fields? I am not sure Iunderstandenough from the AsterixDB serialization [1] to define the data modelfor
objects using it.

Sorry about any confusion.

[1]
https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference

Thank you.
Riyafa

On 9 May 2016 at 20:16, Michael J. Carey <[email protected]> wrote:
I think Preston's suggestion of looking at the AsterixDBimplementation ofits binary data model is a good one, as it shares the efficientfieldaccess by name requirements and several VXQuery folks are experts initsdetails as well. I believe it uses a sorted list instead of a hashtable
internally, perhaps - slightly simpler for updates perhaps.
On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed"<[email protected]>
wrote:

Hi again,
I have been thinking of Till's suggestion of using a dictionary, andIthink it would be a better alternative because then we wouldn't havetoprocess the valuetag of the value of a particular key before movingto the
next key. Hence it would be easy to implement jdm:keys method. Any
suggestions? Shall I updated the wiki and the doc based on this.

Thank you.
Riyafa
On 9 May 2016 at 19:21, Riyafa Abdul Hameed<[email protected]>
wrote:
Hi Till,
Currently I have suggested storing each key followed by the value.Thisuses less space and is quite similar to storing the offset of thevalues
and the access is also linear to the number of keys.

Thanks.
Riyafa

On 9 May 2016 at 18:54, Till Westmann <[email protected]> wrote:
All of this looks pretty good!
Wrt. the question of the dictionary for the fields, I think thatwe
should
consider the 2 ways that we can access an object:
1. Either we get all keys (jdm:keys) or
2. we get a value for a key (jdm:value).

To get all the keys efficiently and to be able to skip huge nested
values
a
simple approach could be store a dictionary of the keys (in their
original
order) with pointers (offsets) to the values. That way we couldget the
keys
quickly by scanning the dictionary and each value by scanning the
dictionary
+ 1 hop to find the value. This certainly has the problem, thatthe
access
is linear in the number of the keys. But it is reasonably simpleand itwould allow us to get a correct + testable implementationrelatively
soon
and to have a baseline for a more optimized representation.

Thoughts?

Cheers,
Till

[1]
http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880
On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:

Hi Preston,
I have edited the wiki[1] and the doc[2] based on the comments.Thank
you
for the suggestions provided. I have removed the part thatassigns an
id
to
the keys and instead suggested that the keys be stored in theorder
they
appear in the json object. I am not sure I understand the conceptof
hashcode--how to generate the hashcodes used for easy lookup?


[1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2]
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
Thank you again.

Yours sincerely,
Riyafa

On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]>
wrote:
Hi,
I updated the wiki page according to Preston's comments alongwith the
json array example in [1].

[1]
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
Thank you,
Christina

On 5/8/2016 9:43 AM, Preston Carman wrote:

Nice job guys. I can see you are picking up how to create a data
model. I have limited my comments to the wiki [1] for now. At ahighlevel, I was impressed with your detail and thoughtful layouts.Itreminds me of the age old trade off: speed vs space. At thistime,lets error on saving space. The data model should the ascompact as
possible.

I also found the AsterixDB serialization [2] we can use as a
reference. Even though the AsterixDB data model includes object
length, I would leave that out since all the XQuery data modelsdo
not
include this property.
Riyafa, take a look at the method AsterixDB uses for quick lookups
(a
hash value for the name). Consider the pros and cons betweenyour
method and AsterixDB's method: a list hash value for name and a
sorted
list of names.

Also, take a look at my wiki comments. Its a great start!

Mahalo,
Preston

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2]
https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou <
[email protected]>
wrote:

Hi,
I, also, designed an example for the json array [1] given the
description I
wrote in the wiki page.

[1]
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit
Thank you,
Christina


On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:

Hi,
I am attempting to create a doc on the JSONiq data model for
objects[1]
(It
might be full of errors because I am doing the calculations
manually).

This is what I have come up on the data model for objects:
The first byte would have the value tag, followed by the id(4
bytes) of
the object. Then 4 bytes to represent the size of the object.Then
another
four bytes to represent the number of key-value pairs. Nextfew
bytes
represent the offsets of keys which follow (each offset is
represented
by
4
bytes). Ids would be assigned to the keys. Next few byteswould be
a
sorted
list of ids for keys in alphabetical order. The followingbytes
would
represent the keys in the object.Each key is aStringPointable
followed
by
the id of the key. Each object would have a sequencepointable: thefollowing bytes would be the number of Items (items are thevalues
for
keys) in the sequence. The next bytes would be the offset ofeach
item
in
the sequence. The last bytes would be the values for each key
followed
by
the respective id of the key.

Hope it makes sense.

My problem is,
I have not provided for the white spaces in the object. Whatcan I
use
to
represent the white spaces? I cannot use a text node becauseobject
is
not
a node.


[1]
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0
Thank you.

Yours sincerely,
Riyafa
On 26 April 2016 at 10:29, Preston Carman<[email protected]>
wrote:
We have two students working with us this summer through GSOCto
complete
JSONiq specification for arrays and objects. I think thefirst
step
is
to
define the data model used by JSONiq. The definition shouldbe
defined
in
our wiki [1] before coding starts this summer. The wiki willallow
the
community to discuss the JSON data model implementation in
VXQuery.
I updated the JSONiq wiki to help get the documentationstarted.
Please
fill in the JSON data model based on the examples seen onour
website
(links on the wiki page).

Post here if you have any questions.
[1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/<http://riyafa.wordpress.com/><http://facebook.com/riyafa.ahf><http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/<http://riyafa.wordpress.com/><http://facebook.com/riyafa.ahf><http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/<http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Reply via email to