I think we have three options: optimize for space, keys (jdm:keys) or field lookup (jdm:value). The optimization for keys and field lookup could be done independently. Lets consider the option currently in the wiki as option 1 (space). Don't remove this option from the wiki so we have a reference. The new options for keys and field lookup can be added as option 2 and 3.
Option 1 (space): A tightly compact format that is optimized to save space. Option 2 (keys): A data model optimized for accessing a list of keys. Option 3 (lookup): A data model optimized for accessing a field in the object. For option 2 (keys): Consider the return value for jdm:keys: jdm:keys($o as object()) as xs:string* I am not sure I fully understand what xs:string* represents. Is this a sequence of string as in XQuery or an array in JSONiq or some other structure. The most optimal way to return the keys would be to store them in the same way they should be returned. This way you can do a simple copy to produce the result without processing the result. In this case, storing them as a sequence (or array) of string values might be the best option. The values would then need to be a separate sequence (or array) of typed values in the object data model. Pro: easy keys function. Con: added a list of offsets for the keys. For option 3 (lookup): This option is independent of option 2. As Till suggested we can implement this at a later date. We would need a method to improve the lookup of a field. Option 1 and 2 requires a sequential search of the keys and a string comparison at each field. The AsterixDB record data model is a little more complex than I first thought. Take a look a their record implementation: writing the record [1] (line 205 to 245 are interesting) and field look up [2] (line 277 to 344) . We only need to consider the open part of the record. (The closed part can be ignored.) Comments? Also, what is the actual result of jdm:keys? What is the requirement for the initial implementation? [1] https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java [2] https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed <[email protected]> wrote: > Hi, > > Is there any documentation I could go through to understand the AsterixDB > Hash code implementation on the open fields? I am not sure I understand > enough from the AsterixDB serialization [1] to define the data model for > objects using it. > > Sorry about any confusion. > > [1] > https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference > > Thank you. > Riyafa > > On 9 May 2016 at 20:16, Michael J. Carey <[email protected]> wrote: > >> I think Preston's suggestion of looking at the AsterixDB implementation of >> its binary data model is a good one, as it shares the efficient field >> access by name requirements and several VXQuery folks are experts in its >> details as well. I believe it uses a sorted list instead of a hash table >> internally, perhaps - slightly simpler for updates perhaps. >> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <[email protected]> >> wrote: >> >> Hi again, >> >> I have been thinking of Till's suggestion of using a dictionary, and I >> think it would be a better alternative because then we wouldn't have to >> process the valuetag of the value of a particular key before moving to the >> next key. Hence it would be easy to implement jdm:keys method. Any >> suggestions? Shall I updated the wiki and the doc based on this. >> >> Thank you. >> Riyafa >> >> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <[email protected]> >> wrote: >> >> > Hi Till, >> > >> > Currently I have suggested storing each key followed by the value. This >> > uses less space and is quite similar to storing the offset of the values >> > and the access is also linear to the number of keys. >> > >> > Thanks. >> > Riyafa >> > >> > On 9 May 2016 at 18:54, Till Westmann <[email protected]> wrote: >> > >> >> All of this looks pretty good! >> >> >> >> Wrt. the question of the dictionary for the fields, I think that we >> should >> >> consider the 2 ways that we can access an object: >> >> 1. Either we get all keys (jdm:keys) or >> >> 2. we get a value for a key (jdm:value). >> >> >> >> To get all the keys efficiently and to be able to skip huge nested >> values >> >> a >> >> simple approach could be store a dictionary of the keys (in their >> original >> >> order) with pointers (offsets) to the values. That way we could get the >> >> keys >> >> quickly by scanning the dictionary and each value by scanning the >> >> dictionary >> >> + 1 hop to find the value. This certainly has the problem, that the >> access >> >> is linear in the number of the keys. But it is reasonably simple and it >> >> would allow us to get a correct + testable implementation relatively >> soon >> >> and to have a baseline for a more optimized representation. >> >> >> >> Thoughts? >> >> >> >> Cheers, >> >> Till >> >> >> >> [1] >> >> >> >> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880 >> >> >> >> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote: >> >> >> >> Hi Preston, >> >>> >> >>> I have edited the wiki[1] and the doc[2] based on the comments. Thank >> you >> >>> for the suggestions provided. I have removed the part that assigns an >> id >> >>> to >> >>> the keys and instead suggested that the keys be stored in the order >> they >> >>> appear in the json object. I am not sure I understand the concept of >> >>> hashcode--how to generate the hashcodes used for easy lookup? >> >>> >> >>> >> >>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >> >>> [2] >> >>> >> >>> >> >> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 >> >>> >> >>> Thank you again. >> >>> >> >>> Yours sincerely, >> >>> Riyafa >> >>> >> >>> On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]> >> wrote: >> >>> >> >>> Hi, >> >>>> >> >>>> I updated the wiki page according to Preston's comments along with the >> >>>> json array example in [1]. >> >>>> >> >>>> [1] >> >>>> >> >>>> >> >> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit >> >>>> >> >>>> Thank you, >> >>>> Christina >> >>>> >> >>>> On 5/8/2016 9:43 AM, Preston Carman wrote: >> >>>> >> >>>> Nice job guys. I can see you are picking up how to create a data >> >>>>> model. I have limited my comments to the wiki [1] for now. At a high >> >>>>> level, I was impressed with your detail and thoughtful layouts. It >> >>>>> reminds me of the age old trade off: speed vs space. At this time, >> >>>>> lets error on saving space. The data model should the as compact as >> >>>>> possible. >> >>>>> >> >>>>> I also found the AsterixDB serialization [2] we can use as a >> >>>>> reference. Even though the AsterixDB data model includes object >> >>>>> length, I would leave that out since all the XQuery data models do >> not >> >>>>> include this property. >> >>>>> >> >>>>> Riyafa, take a look at the method AsterixDB uses for quick look ups >> (a >> >>>>> hash value for the name). Consider the pros and cons between your >> >>>>> method and AsterixDB's method: a list hash value for name and a >> sorted >> >>>>> list of names. >> >>>>> >> >>>>> Also, take a look at my wiki comments. Its a great start! >> >>>>> >> >>>>> Mahalo, >> >>>>> Preston >> >>>>> >> >>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >> >>>>> [2] >> >>>>> >> >>>>> >> >> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference >> >>>>> >> >>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou < >> >>>>> [email protected]> >> >>>>> wrote: >> >>>>> >> >>>>> Hi, >> >>>>>> >> >>>>>> I, also, designed an example for the json array [1] given the >> >>>>>> description I >> >>>>>> wrote in the wiki page. >> >>>>>> >> >>>>>> [1] >> >>>>>> >> >>>>>> >> >>>>>> >> >> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit >> >>>>>> >> >>>>>> Thank you, >> >>>>>> Christina >> >>>>>> >> >>>>>> >> >>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote: >> >>>>>> >> >>>>>> Hi, >> >>>>>>> >> >>>>>>> I am attempting to create a doc on the JSONiq data model for >> >>>>>>> objects[1] >> >>>>>>> (It >> >>>>>>> might be full of errors because I am doing the calculations >> >>>>>>> manually). >> >>>>>>> >> >>>>>>> This is what I have come up on the data model for objects: >> >>>>>>> >> >>>>>>> The first byte would have the value tag, followed by the id (4 >> >>>>>>> bytes) of >> >>>>>>> the object. Then 4 bytes to represent the size of the object. Then >> >>>>>>> another >> >>>>>>> four bytes to represent the number of key-value pairs. Next few >> bytes >> >>>>>>> represent the offsets of keys which follow (each offset is >> >>>>>>> represented >> >>>>>>> by >> >>>>>>> 4 >> >>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would be >> a >> >>>>>>> sorted >> >>>>>>> list of ids for keys in alphabetical order. The following bytes >> would >> >>>>>>> represent the keys in the object.Each key is a StringPointable >> >>>>>>> followed >> >>>>>>> by >> >>>>>>> the id of the key. Each object would have a sequence pointable: the >> >>>>>>> following bytes would be the number of Items (items are the values >> >>>>>>> for >> >>>>>>> keys) in the sequence. The next bytes would be the offset of each >> >>>>>>> item >> >>>>>>> in >> >>>>>>> the sequence. The last bytes would be the values for each key >> >>>>>>> followed >> >>>>>>> by >> >>>>>>> the respective id of the key. >> >>>>>>> >> >>>>>>> Hope it makes sense. >> >>>>>>> >> >>>>>>> My problem is, >> >>>>>>> >> >>>>>>> I have not provided for the white spaces in the object. What can I >> >>>>>>> use >> >>>>>>> to >> >>>>>>> represent the white spaces? I cannot use a text node because object >> >>>>>>> is >> >>>>>>> not >> >>>>>>> a node. >> >>>>>>> >> >>>>>>> >> >>>>>>> [1] >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 >> >>>>>>> >> >>>>>>> Thank you. >> >>>>>>> >> >>>>>>> Yours sincerely, >> >>>>>>> Riyafa >> >>>>>>> >> >>>>>>> >> >>>>>>> On 26 April 2016 at 10:29, Preston Carman <[email protected]> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> We have two students working with us this summer through GSOC to >> >>>>>>> >> >>>>>>>> complete >> >>>>>>>> JSONiq specification for arrays and objects. I think the first >> step >> >>>>>>>> is >> >>>>>>>> to >> >>>>>>>> define the data model used by JSONiq. The definition should be >> >>>>>>>> defined >> >>>>>>>> in >> >>>>>>>> our wiki [1] before coding starts this summer. The wiki will allow >> >>>>>>>> the >> >>>>>>>> community to discuss the JSON data model implementation in >> VXQuery. >> >>>>>>>> >> >>>>>>>> I updated the JSONiq wiki to help get the documentation started. >> >>>>>>>> Please >> >>>>>>>> fill in the JSON data model based on the examples seen on our >> >>>>>>>> website >> >>>>>>>> (links on the wiki page). >> >>>>>>>> >> >>>>>>>> Post here if you have any questions. >> >>>>>>>> >> >>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>> >> >>> >> >>> -- >> >>> Riyafa Abdul Hameed >> >>> Undergraduate, University of Moratuwa >> >>> >> >>> Email: [email protected] >> >>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >> >>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >> >>> <http://twitter.com/Riyafa1> >> >>> >> >> >> > >> > >> > -- >> > Riyafa Abdul Hameed >> > Undergraduate, University of Moratuwa >> > >> > Email: [email protected] >> > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >> > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >> > <http://twitter.com/Riyafa1> >> > >> >> >> >> -- >> Riyafa Abdul Hameed >> Undergraduate, University of Moratuwa >> >> Email: [email protected] >> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >> <http://twitter.com/Riyafa1> >> > > > > -- > Riyafa Abdul Hameed > Undergraduate, University of Moratuwa > > Email: [email protected] > Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> > <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> > <http://twitter.com/Riyafa1>
