Hi, Is there any documentation I could go through to understand the AsterixDB Hash code implementation on the open fields? I am not sure I understand enough from the AsterixDB serialization [1] to define the data model for objects using it.
Sorry about any confusion. [1] https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference Thank you. Riyafa On 9 May 2016 at 20:16, Michael J. Carey <[email protected]> wrote: > I think Preston's suggestion of looking at the AsterixDB implementation of > its binary data model is a good one, as it shares the efficient field > access by name requirements and several VXQuery folks are experts in its > details as well. I believe it uses a sorted list instead of a hash table > internally, perhaps - slightly simpler for updates perhaps. > On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <[email protected]> > wrote: > > Hi again, > > I have been thinking of Till's suggestion of using a dictionary, and I > think it would be a better alternative because then we wouldn't have to > process the valuetag of the value of a particular key before moving to the > next key. Hence it would be easy to implement jdm:keys method. Any > suggestions? Shall I updated the wiki and the doc based on this. > > Thank you. > Riyafa > > On 9 May 2016 at 19:21, Riyafa Abdul Hameed <[email protected]> > wrote: > > > Hi Till, > > > > Currently I have suggested storing each key followed by the value. This > > uses less space and is quite similar to storing the offset of the values > > and the access is also linear to the number of keys. > > > > Thanks. > > Riyafa > > > > On 9 May 2016 at 18:54, Till Westmann <[email protected]> wrote: > > > >> All of this looks pretty good! > >> > >> Wrt. the question of the dictionary for the fields, I think that we > should > >> consider the 2 ways that we can access an object: > >> 1. Either we get all keys (jdm:keys) or > >> 2. we get a value for a key (jdm:value). > >> > >> To get all the keys efficiently and to be able to skip huge nested > values > >> a > >> simple approach could be store a dictionary of the keys (in their > original > >> order) with pointers (offsets) to the values. That way we could get the > >> keys > >> quickly by scanning the dictionary and each value by scanning the > >> dictionary > >> + 1 hop to find the value. This certainly has the problem, that the > access > >> is linear in the number of the keys. But it is reasonably simple and it > >> would allow us to get a correct + testable implementation relatively > soon > >> and to have a baseline for a more optimized representation. > >> > >> Thoughts? > >> > >> Cheers, > >> Till > >> > >> [1] > >> > > http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880 > >> > >> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote: > >> > >> Hi Preston, > >>> > >>> I have edited the wiki[1] and the doc[2] based on the comments. Thank > you > >>> for the suggestions provided. I have removed the part that assigns an > id > >>> to > >>> the keys and instead suggested that the keys be stored in the order > they > >>> appear in the json object. I am not sure I understand the concept of > >>> hashcode--how to generate the hashcodes used for easy lookup? > >>> > >>> > >>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq > >>> [2] > >>> > >>> > > https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 > >>> > >>> Thank you again. > >>> > >>> Yours sincerely, > >>> Riyafa > >>> > >>> On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]> > wrote: > >>> > >>> Hi, > >>>> > >>>> I updated the wiki page according to Preston's comments along with the > >>>> json array example in [1]. > >>>> > >>>> [1] > >>>> > >>>> > > https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit > >>>> > >>>> Thank you, > >>>> Christina > >>>> > >>>> On 5/8/2016 9:43 AM, Preston Carman wrote: > >>>> > >>>> Nice job guys. I can see you are picking up how to create a data > >>>>> model. I have limited my comments to the wiki [1] for now. At a high > >>>>> level, I was impressed with your detail and thoughtful layouts. It > >>>>> reminds me of the age old trade off: speed vs space. At this time, > >>>>> lets error on saving space. The data model should the as compact as > >>>>> possible. > >>>>> > >>>>> I also found the AsterixDB serialization [2] we can use as a > >>>>> reference. Even though the AsterixDB data model includes object > >>>>> length, I would leave that out since all the XQuery data models do > not > >>>>> include this property. > >>>>> > >>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups > (a > >>>>> hash value for the name). Consider the pros and cons between your > >>>>> method and AsterixDB's method: a list hash value for name and a > sorted > >>>>> list of names. > >>>>> > >>>>> Also, take a look at my wiki comments. Its a great start! > >>>>> > >>>>> Mahalo, > >>>>> Preston > >>>>> > >>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq > >>>>> [2] > >>>>> > >>>>> > > https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference > >>>>> > >>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou < > >>>>> [email protected]> > >>>>> wrote: > >>>>> > >>>>> Hi, > >>>>>> > >>>>>> I, also, designed an example for the json array [1] given the > >>>>>> description I > >>>>>> wrote in the wiki page. > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >>>>>> > > https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit > >>>>>> > >>>>>> Thank you, > >>>>>> Christina > >>>>>> > >>>>>> > >>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote: > >>>>>> > >>>>>> Hi, > >>>>>>> > >>>>>>> I am attempting to create a doc on the JSONiq data model for > >>>>>>> objects[1] > >>>>>>> (It > >>>>>>> might be full of errors because I am doing the calculations > >>>>>>> manually). > >>>>>>> > >>>>>>> This is what I have come up on the data model for objects: > >>>>>>> > >>>>>>> The first byte would have the value tag, followed by the id (4 > >>>>>>> bytes) of > >>>>>>> the object. Then 4 bytes to represent the size of the object. Then > >>>>>>> another > >>>>>>> four bytes to represent the number of key-value pairs. Next few > bytes > >>>>>>> represent the offsets of keys which follow (each offset is > >>>>>>> represented > >>>>>>> by > >>>>>>> 4 > >>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be > a > >>>>>>> sorted > >>>>>>> list of ids for keys in alphabetical order. The following bytes > would > >>>>>>> represent the keys in the object.Each key is a StringPointable > >>>>>>> followed > >>>>>>> by > >>>>>>> the id of the key. Each object would have a sequence pointable: the > >>>>>>> following bytes would be the number of Items (items are the values > >>>>>>> for > >>>>>>> keys) in the sequence. The next bytes would be the offset of each > >>>>>>> item > >>>>>>> in > >>>>>>> the sequence. The last bytes would be the values for each key > >>>>>>> followed > >>>>>>> by > >>>>>>> the respective id of the key. > >>>>>>> > >>>>>>> Hope it makes sense. > >>>>>>> > >>>>>>> My problem is, > >>>>>>> > >>>>>>> I have not provided for the white spaces in the object. What can I > >>>>>>> use > >>>>>>> to > >>>>>>> represent the white spaces? I cannot use a text node because object > >>>>>>> is > >>>>>>> not > >>>>>>> a node. > >>>>>>> > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > > https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 > >>>>>>> > >>>>>>> Thank you. > >>>>>>> > >>>>>>> Yours sincerely, > >>>>>>> Riyafa > >>>>>>> > >>>>>>> > >>>>>>> On 26 April 2016 at 10:29, Preston Carman <[email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>> We have two students working with us this summer through GSOC to > >>>>>>> > >>>>>>>> complete > >>>>>>>> JSONiq specification for arrays and objects. I think the first > step > >>>>>>>> is > >>>>>>>> to > >>>>>>>> define the data model used by JSONiq. The definition should be > >>>>>>>> defined > >>>>>>>> in > >>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow > >>>>>>>> the > >>>>>>>> community to discuss the JSON data model implementation in > VXQuery. > >>>>>>>> > >>>>>>>> I updated the JSONiq wiki to help get the documentation started. > >>>>>>>> Please > >>>>>>>> fill in the JSON data model based on the examples seen on our > >>>>>>>> website > >>>>>>>> (links on the wiki page). > >>>>>>>> > >>>>>>>> Post here if you have any questions. > >>>>>>>> > >>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>> > >>> > >>> -- > >>> Riyafa Abdul Hameed > >>> Undergraduate, University of Moratuwa > >>> > >>> Email: [email protected] > >>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > >>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > >>> <http://twitter.com/Riyafa1> > >>> > >> > > > > > > -- > > Riyafa Abdul Hameed > > Undergraduate, University of Moratuwa > > > > Email: [email protected] > > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > > <http://twitter.com/Riyafa1> > > > > > > -- > Riyafa Abdul Hameed > Undergraduate, University of Moratuwa > > Email: [email protected] > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > <http://twitter.com/Riyafa1> > -- Riyafa Abdul Hameed Undergraduate, University of Moratuwa Email: [email protected] Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> <http://twitter.com/Riyafa1>
