Hi, After going through alternative data models to represent objects including more optimized methods of lookup, it has been decided to go along with the most basic model which is Option 1 as suggested by Preston and recorded in the wiki[1]. This was because after getting things to work using the simple method further optimization could be carried out.
[1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq Thank you. Riyafa On 10 May 2016 at 10:59, Michael Carey <[email protected]> wrote: > Sounds like a great plan! > > > > On 5/9/16 10:18 PM, Till Westmann wrote: > >> >> >> On 9 May 2016, at 12:02, Preston Carman wrote: >> >> I think we have three options: optimize for space, keys (jdm:keys) or >>> field lookup (jdm:value). The optimization for keys and field lookup >>> could be done independently. Lets consider the option currently in the >>> wiki as option 1 (space). Don't remove this option from the wiki so we >>> have a reference. The new options for keys and field lookup can be >>> added as option 2 and 3. >>> >>> Option 1 (space): A tightly compact format that is optimized to save >>> space. >>> Option 2 (keys): A data model optimized for accessing a list of keys. >>> Option 3 (lookup): A data model optimized for accessing a field in the >>> object. >>> >>> For option 2 (keys): >>> Consider the return value for jdm:keys: jdm:keys($o as object()) as >>> xs:string* >>> I am not sure I fully understand what xs:string* represents. Is this a >>> sequence of string as in XQuery or an array in JSONiq or some other >>> structure. The most optimal way to return the keys would be to store >>> them in the same way they should be returned. This way you can do a >>> simple copy to produce the result without processing the result. In >>> this case, storing them as a sequence (or array) of string values >>> might be the best option. The values would then need to be a separate >>> sequence (or array) of typed values in the object data model. Pro: >>> easy keys function. Con: added a list of offsets for the keys. >>> >> >> xs:string* is indeed a sequence of strings >> >> For option 3 (lookup): >>> This option is independent of option 2. As Till suggested we can >>> implement this at a later date. We would need a method to improve the >>> lookup of a field. Option 1 and 2 requires a sequential search of the >>> keys and a string comparison at each field. The AsterixDB record data >>> model is a little more complex than I first thought. Take a look a >>> their record implementation: writing the record [1] (line 205 to 245 >>> are interesting) and field look up [2] (line 277 to 344) . We only >>> need to consider the open part of the record. (The closed part can be >>> ignored.) >>> >> >> I had another idea for the implementation of the dictionary. We could >> store the keys in sorted order - while we store the values in the original >> order. If each key is then followed by the offset to the value, we would >> get >> a) a log n access for a value (as the keys are sorted and we can do binary >> search) and >> b) the keys in their original order, if we sort them by the offsets. >> Assuming that the value() access is quite a bit more common than the >> keys() access this could be a reasonable trade-off. >> >> Comments? >>> >> >> Sounds good to list the options on the Wiki page. >> >> Also, what is the actual result of jdm:keys? >>> >> >> A sequence of strings. >> >> What is the requirement for the initial implementation? >>> >> >> It should be correct and tested. >> >> My 2c, >> Till >> >> [1] >>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/builders/RecordBuilder.java >>> [2] >>> https://github.com/apache/incubator-asterixdb/blob/master/asterixdb/asterix-om/src/main/java/org/apache/asterix/dataflow/data/nontagged/serde/ARecordSerializerDeserializer.java >>> >>> On Mon, May 9, 2016 at 8:35 AM, Riyafa Abdul Hameed >>> <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Is there any documentation I could go through to understand the >>>> AsterixDB >>>> Hash code implementation on the open fields? I am not sure I understand >>>> enough from the AsterixDB serialization [1] to define the data model for >>>> objects using it. >>>> >>>> Sorry about any confusion. >>>> >>>> [1] >>>> >>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference >>>> >>>> Thank you. >>>> Riyafa >>>> >>>> On 9 May 2016 at 20:16, Michael J. Carey <[email protected]> wrote: >>>> >>>> I think Preston's suggestion of looking at the AsterixDB implementation >>>>> of >>>>> its binary data model is a good one, as it shares the efficient field >>>>> access by name requirements and several VXQuery folks are experts in >>>>> its >>>>> details as well. I believe it uses a sorted list instead of a hash >>>>> table >>>>> internally, perhaps - slightly simpler for updates perhaps. >>>>> On May 9, 2016 7:35 AM, "Riyafa Abdul Hameed" <[email protected] >>>>> > >>>>> wrote: >>>>> >>>>> Hi again, >>>>> >>>>> I have been thinking of Till's suggestion of using a dictionary, and I >>>>> think it would be a better alternative because then we wouldn't have to >>>>> process the valuetag of the value of a particular key before moving to >>>>> the >>>>> next key. Hence it would be easy to implement jdm:keys method. Any >>>>> suggestions? Shall I updated the wiki and the doc based on this. >>>>> >>>>> Thank you. >>>>> Riyafa >>>>> >>>>> On 9 May 2016 at 19:21, Riyafa Abdul Hameed <[email protected]> >>>>> wrote: >>>>> >>>>> Hi Till, >>>>>> >>>>>> Currently I have suggested storing each key followed by the value. >>>>>> This >>>>>> uses less space and is quite similar to storing the offset of the >>>>>> values >>>>>> and the access is also linear to the number of keys. >>>>>> >>>>>> Thanks. >>>>>> Riyafa >>>>>> >>>>>> On 9 May 2016 at 18:54, Till Westmann <[email protected]> wrote: >>>>>> >>>>>> All of this looks pretty good! >>>>>>> >>>>>>> Wrt. the question of the dictionary for the fields, I think that we >>>>>>> >>>>>> should >>>>> >>>>>> consider the 2 ways that we can access an object: >>>>>>> 1. Either we get all keys (jdm:keys) or >>>>>>> 2. we get a value for a key (jdm:value). >>>>>>> >>>>>>> To get all the keys efficiently and to be able to skip huge nested >>>>>>> >>>>>> values >>>>> >>>>>> a >>>>>>> simple approach could be store a dictionary of the keys (in their >>>>>>> >>>>>> original >>>>> >>>>>> order) with pointers (offsets) to the values. That way we could get >>>>>>> the >>>>>>> keys >>>>>>> quickly by scanning the dictionary and each value by scanning the >>>>>>> dictionary >>>>>>> + 1 hop to find the value. This certainly has the problem, that the >>>>>>> >>>>>> access >>>>> >>>>>> is linear in the number of the keys. But it is reasonably simple and >>>>>>> it >>>>>>> would allow us to get a correct + testable implementation relatively >>>>>>> >>>>>> soon >>>>> >>>>>> and to have a baseline for a more optimized representation. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>>> >>>>> http://jsoniq.org/docs/JSONiqExtensionToXQuery/html-single/index.html#idm139680641300880 >>>>> >>>>>> >>>>>>> On 8 May 2016, at 22:19, Riyafa Abdul Hameed wrote: >>>>>>> >>>>>>> Hi Preston, >>>>>>> >>>>>>>> >>>>>>>> I have edited the wiki[1] and the doc[2] based on the comments. >>>>>>>> Thank >>>>>>>> >>>>>>> you >>>>> >>>>>> for the suggestions provided. I have removed the part that assigns an >>>>>>>> >>>>>>> id >>>>> >>>>>> to >>>>>>>> the keys and instead suggested that the keys be stored in the order >>>>>>>> >>>>>>> they >>>>> >>>>>> appear in the json object. I am not sure I understand the concept of >>>>>>>> hashcode--how to generate the hashcodes used for easy lookup? >>>>>>>> >>>>>>>> >>>>>>>> [1]https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >>>>>>>> [2] >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 >>>>> >>>>>> >>>>>>>> Thank you again. >>>>>>>> >>>>>>>> Yours sincerely, >>>>>>>> Riyafa >>>>>>>> >>>>>>>> On 9 May 2016 at 01:23, christina pavlopoulou <[email protected]> >>>>>>>> >>>>>>> wrote: >>>>> >>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>>> >>>>>>>>> I updated the wiki page according to Preston's comments along with >>>>>>>>> the >>>>>>>>> json array example in [1]. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> >>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit >>>>> >>>>>> >>>>>>>>> Thank you, >>>>>>>>> Christina >>>>>>>>> >>>>>>>>> On 5/8/2016 9:43 AM, Preston Carman wrote: >>>>>>>>> >>>>>>>>> Nice job guys. I can see you are picking up how to create a data >>>>>>>>> >>>>>>>>>> model. I have limited my comments to the wiki [1] for now. At a >>>>>>>>>> high >>>>>>>>>> level, I was impressed with your detail and thoughtful layouts. It >>>>>>>>>> reminds me of the age old trade off: speed vs space. At this time, >>>>>>>>>> lets error on saving space. The data model should the as compact >>>>>>>>>> as >>>>>>>>>> possible. >>>>>>>>>> >>>>>>>>>> I also found the AsterixDB serialization [2] we can use as a >>>>>>>>>> reference. Even though the AsterixDB data model includes object >>>>>>>>>> length, I would leave that out since all the XQuery data models do >>>>>>>>>> >>>>>>>>> not >>>>> >>>>>> include this property. >>>>>>>>>> >>>>>>>>>> Riyafa, take a look at the method AsterixDB uses for quick look >>>>>>>>>> ups >>>>>>>>>> >>>>>>>>> (a >>>>> >>>>>> hash value for the name). Consider the pros and cons between your >>>>>>>>>> method and AsterixDB's method: a list hash value for name and a >>>>>>>>>> >>>>>>>>> sorted >>>>> >>>>>> list of names. >>>>>>>>>> >>>>>>>>>> Also, take a look at my wiki comments. Its a great start! >>>>>>>>>> >>>>>>>>>> Mahalo, >>>>>>>>>> Preston >>>>>>>>>> >>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >>>>>>>>>> [2] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>> >>>>> https://cwiki.apache.org/confluence/display/ASTERIXDB/AsterixDB+Object+Serialization+Reference >>>>> >>>>>> >>>>>>>>>> On Sat, May 7, 2016 at 6:47 PM, christina pavlopoulou < >>>>>>>>>> [email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I, also, designed an example for the json array [1] given the >>>>>>>>>>> description I >>>>>>>>>>> wrote in the wiki page. >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>> >>>>> https://docs.google.com/document/d/1GOAcvhw_F9cJrNmRq2TwZxI0wYRmvLEV3mywJS4H9Lg/edit >>>>> >>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Christina >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 5/7/2016 11:22 AM, Riyafa Abdul Hameed wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I am attempting to create a doc on the JSONiq data model for >>>>>>>>>>>> objects[1] >>>>>>>>>>>> (It >>>>>>>>>>>> might be full of errors because I am doing the calculations >>>>>>>>>>>> manually). >>>>>>>>>>>> >>>>>>>>>>>> This is what I have come up on the data model for objects: >>>>>>>>>>>> >>>>>>>>>>>> The first byte would have the value tag, followed by the id (4 >>>>>>>>>>>> bytes) of >>>>>>>>>>>> the object. Then 4 bytes to represent the size of the object. >>>>>>>>>>>> Then >>>>>>>>>>>> another >>>>>>>>>>>> four bytes to represent the number of key-value pairs. Next few >>>>>>>>>>>> >>>>>>>>>>> bytes >>>>> >>>>>> represent the offsets of keys which follow (each offset is >>>>>>>>>>>> represented >>>>>>>>>>>> by >>>>>>>>>>>> 4 >>>>>>>>>>>> bytes). Ids would be assigned to the keys. Next few bytes would >>>>>>>>>>>> be >>>>>>>>>>>> >>>>>>>>>>> a >>>>> >>>>>> sorted >>>>>>>>>>>> list of ids for keys in alphabetical order. The following bytes >>>>>>>>>>>> >>>>>>>>>>> would >>>>> >>>>>> represent the keys in the object.Each key is a StringPointable >>>>>>>>>>>> followed >>>>>>>>>>>> by >>>>>>>>>>>> the id of the key. Each object would have a sequence pointable: >>>>>>>>>>>> the >>>>>>>>>>>> following bytes would be the number of Items (items are the >>>>>>>>>>>> values >>>>>>>>>>>> for >>>>>>>>>>>> keys) in the sequence. The next bytes would be the offset of >>>>>>>>>>>> each >>>>>>>>>>>> item >>>>>>>>>>>> in >>>>>>>>>>>> the sequence. The last bytes would be the values for each key >>>>>>>>>>>> followed >>>>>>>>>>>> by >>>>>>>>>>>> the respective id of the key. >>>>>>>>>>>> >>>>>>>>>>>> Hope it makes sense. >>>>>>>>>>>> >>>>>>>>>>>> My problem is, >>>>>>>>>>>> >>>>>>>>>>>> I have not provided for the white spaces in the object. What >>>>>>>>>>>> can I >>>>>>>>>>>> use >>>>>>>>>>>> to >>>>>>>>>>>> represent the white spaces? I cannot use a text node because >>>>>>>>>>>> object >>>>>>>>>>>> is >>>>>>>>>>>> not >>>>>>>>>>>> a node. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>> >>>>> https://drive.google.com/open?id=1-wT0pE8rTTNIzuY4iTgvhqkdHmKGek4CgNthXN6mlm0 >>>>> >>>>>> >>>>>>>>>>>> Thank you. >>>>>>>>>>>> >>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>> Riyafa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 26 April 2016 at 10:29, Preston Carman <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> We have two students working with us this summer through GSOC to >>>>>>>>>>>> >>>>>>>>>>>> complete >>>>>>>>>>>>> JSONiq specification for arrays and objects. I think the first >>>>>>>>>>>>> >>>>>>>>>>>> step >>>>> >>>>>> is >>>>>>>>>>>>> to >>>>>>>>>>>>> define the data model used by JSONiq. The definition should be >>>>>>>>>>>>> defined >>>>>>>>>>>>> in >>>>>>>>>>>>> our wiki [1] before coding starts this summer. The wiki will >>>>>>>>>>>>> allow >>>>>>>>>>>>> the >>>>>>>>>>>>> community to discuss the JSON data model implementation in >>>>>>>>>>>>> >>>>>>>>>>>> VXQuery. >>>>> >>>>>> >>>>>>>>>>>>> I updated the JSONiq wiki to help get the documentation >>>>>>>>>>>>> started. >>>>>>>>>>>>> Please >>>>>>>>>>>>> fill in the JSON data model based on the examples seen on our >>>>>>>>>>>>> website >>>>>>>>>>>>> (links on the wiki page). >>>>>>>>>>>>> >>>>>>>>>>>>> Post here if you have any questions. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] https://cwiki.apache.org/confluence/display/VXQUERY/JSONiq >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>>> -- >>>>>>>> Riyafa Abdul Hameed >>>>>>>> Undergraduate, University of Moratuwa >>>>>>>> >>>>>>>> Email: [email protected] >>>>>>>> Website: https://riyafa.wordpress.com/ < >>>>>>>> http://riyafa.wordpress.com/> >>>>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >>>>>>>> <http://twitter.com/Riyafa1> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> Riyafa Abdul Hameed >>>>>> Undergraduate, University of Moratuwa >>>>>> >>>>>> Email: [email protected] >>>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >>>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >>>>>> <http://twitter.com/Riyafa1> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Riyafa Abdul Hameed >>>>> Undergraduate, University of Moratuwa >>>>> >>>>> Email: [email protected] >>>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >>>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >>>>> <http://twitter.com/Riyafa1> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Riyafa Abdul Hameed >>>> Undergraduate, University of Moratuwa >>>> >>>> Email: [email protected] >>>> Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> >>>> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> >>>> <http://twitter.com/Riyafa1> >>>> >>> > -- Riyafa Abdul Hameed Undergraduate, University of Moratuwa Email: [email protected] Website: https://riyafa.wordpress.com/ <http://riyafa.wordpress.com/> <http://facebook.com/riyafa.ahf> <http://lk.linkedin.com/in/riyafa> <http://twitter.com/Riyafa1>
