Re: JSONiq data model

Till Westmann Mon, 09 May 2016 06:24:53 -0700

All of this looks pretty good!

Wrt. the question of the dictionary for the fields, I think that weshould

consider the 2 ways that we can access an object:
1. Either we get all keys (jdm:keys) or
2. we get a value for a key (jdm:value).

To get all the keys efficiently and to be able to skip huge nestedvalues asimple approach could be store a dictionary of the keys (in theiroriginalorder) with pointers (offsets) to the values. That way we could get thekeysquickly by scanning the dictionary and each value by scanning thedictionary+ 1 hop to find the value. This certainly has the problem, that theaccess

is linear in the number of the keys. But it is reasonably simple and it

would allow us to get a correct + testable implementation relativelysoon

and to have a baseline for a more optimized representation.

Thoughts?

Cheers,
Till

[1]http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880


On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote:

Hi Preston,
I have edited the wiki[1] and the doc[2] based on the comments. Thankyoufor the suggestions provided. I have removed the part that assigns anid tothe keys and instead suggested that the keys be stored in the orderthey
appear in the json object. I am not sure I understand the concept of
hashcode--how to generate the hashcodes used for easy lookup?


[1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2]
https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0

Thank you again.

Yours sincerely,
Riyafa
On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]>wrote:
Hi,
I updated the wiki page according to Preston's comments along withthe
json array example in [1].

[1]
https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit

Thank you,
Christina

On 5/8/2016 9:43 AM, Preston Carman wrote:
Nice job guys. I can see you are picking up how to create a data
model. I have limited my comments to the wiki [1] for now. At a high
level, I was impressed with your detail and thoughtful layouts. It
reminds me of the age old trade off: speed vs space. At this time,
lets error on saving space. The data model should the as compact as
possible.

I also found the AsterixDB serialization [2] we can use as a
reference. Even though the AsterixDB data model includes object
length, I would leave that out since all the XQuery data models donot
include this property.
Riyafa, take a look at the method AsterixDB uses for quick look ups(a
hash value for the name). Consider the pros and cons between your
method and AsterixDB's method: a list hash value for name and asorted
list of names.

Also, take a look at my wiki comments. Its a great start!

Mahalo,
Preston

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
[2]
https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference
On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou<[email protected]>
wrote:
Hi,

I, also, designed an example for the json array [1] given the
description I
wrote in the wiki page.

[1]

https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit

Thank you,
Christina


On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote:
Hi,
I am attempting to create a doc on the JSONiq data model forobjects[1]
(It
might be full of errors because I am doing the calculationsmanually).
This is what I have come up on the data model for objects:
The first byte would have the value tag, followed by the id (4bytes) of
the object. Then 4 bytes to represent the size of the object. Then
another
four bytes to represent the number of key-value pairs. Next fewbytesrepresent the offsets of keys which follow (each offset isrepresented
by
4
bytes). Ids would be assigned to the keys. Next few bytes would bea
sorted
list of ids for keys in alphabetical order. The following byteswouldrepresent the keys in the object.Each key is a StringPointablefollowed
by
the id of the key. Each object would have a sequence pointable:thefollowing bytes would be the number of Items (items are the valuesforkeys) in the sequence. The next bytes would be the offset of eachitem
in
the sequence. The last bytes would be the values for each keyfollowed
by
the respective id of the key.

Hope it makes sense.

My problem is,
I have not provided for the white spaces in the object. What can Iuse
to
represent the white spaces? I cannot use a text node becauseobject is
not
a node.


[1]


https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0

Thank you.

Yours sincerely,
Riyafa
On 26 April 2016 at 10:29, Preston Carman <[email protected]>wrote:
We have two students working with us this summer through GSOC to
complete
JSONiq specification for arrays and objects. I think the firststep is
to
define the data model used by JSONiq. The definition should bedefined
in
our wiki [1] before coding starts this summer. The wiki willallow thecommunity to discuss the JSON data model implementation inVXQuery.
I updated the JSONiq wiki to help get the documentation started.Pleasefill in the JSON data model based on the examples seen on ourwebsite
(links on the wiki page).

Post here if you have any questions.

[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq
--
Riyafa Abdul Hameed
Undergraduate, University of Moratuwa

Email: [email protected]
Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/>
<http://facebook.com/riyafa.ahf>  <http://lk.linkedin.com/in/riyafa>
<http://twitter.com/Riyafa1>

Re: JSONiq data model

Reply via email to