Hi. I updated my report in the Wiki[1]. Also, I pushed my last commits to my branch [2]. Please give it a look if you have time.
This week, I will give a look to the map reduce tests for DataStores. Please let me know if you have suggestions. [1] https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports [2] https://github.com/jhnmora000/gora/tree/GORA-485 Thanks, John El sáb., 13 jul. 2019 a las 19:31, John Mora (<jhnmora...@gmail.com>) escribió: > Hi all > > I updated my report in the Wiki[1]. Also, I pushed my last commits to my > branch [2]. Please give it a look if you have time. > > This week, I will be working in the getPartitions and deleteByQuery > methods and testing the other tests in the DataStoreTestBase class. > > Please let me know if you have suggestions. > > [1] > https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports > [2] https://github.com/jhnmora000/gora/tree/GORA-485 > > Best, > John. > > El mié., 10 jul. 2019 a las 16:17, John Mora (<jhnmora...@gmail.com>) > escribió: > >> Hi Alfonso, >> >> Thanks so much for your time and support for this project. I will work on >> your comments. Responses inline :) >> >> >> El mar., 9 jul. 2019 a las 16:38, Alfonso Nishikawa (< >> alfonso.nishik...@gmail.com>) escribió: >> >>> Hi, John. >>> >>> Sorry for the delay, I am changing work and I have been very busy :( I >>> will try to answer your questions :) >>> >>> *> In the Employee example there is a field called 'dateOfBirth'. I >>> tried to map that field with the UNIXTIME_MICROS datatype of Kudu (I >>> intuitively assumed this is a date.). However, in the java world the >>> Employee field is a Long value and the kudu datatype is a Timestamp. So, I >>> was wondering whether I should force the usage of the UNIXTIME_MICROS >>> datatype for this field or just use a LONG datatype in Kudu.* >>> >>> In Avro 1.8 were introduced "Logical Types" so there is a "date" type >>> with an underlying "int" [1]. It's the first time I read about because >>> until the last version upgrade of Avro this weren't there. I would suggest >>> to ignore "dates" and map dateOfBirth as long, since in any case -in avro- >>> the value is the unix epoch. After this first approach, a design >>> improvement would be great, though :) >>> >>> - Would be good to have in the mapping a "timestamp" type so KuduStore >>> converts between the Entity long field <-> Kudu timestamp storage? >>> - Is there any other approach? >>> >> >> I think that Entity long field <-> Kudu timestamp conversion that the >> best alternative right now. Because, I would add more compatible datatypes >> to the mapping parameters which users can use. And this conversion should >> not be dificult to implement in my opinion. Also, the new Date datatype of >> avro could be implemented in newer versions because it would need further >> analysis in other datastores too. I will work on that. >> >> >>> >>> >>> *> What is the Gora's policy regarding flush()? * >>> *> KuduClient has multiple flushing modes >>> <https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html>and >>> also can set time interval >>> <https://kudu.apache.org/releases/1.2.0/apidocs/org/apache/kudu/client/KuduSession.html#setFlushInterval-int-> >>> for automatic flush.* >>> *> Should theses behaviors be configurable using gora.properties file? >>> or just use the default configurations.* >>> >>> What we do in HBase is configure an autoflush option in gora.properties >>> [2] which is used when instanced the Table, but at the same time we >>> implement the flush() method to force the flush [3]. I would suggest to >>> follow that example, but adding the flushing options of Kudu. What flushing >>> mode (and time interval if it applies) do you suggest? >>> >> >> Well, IMHO the default flush mode (auto flush sync) will do the job for >> most use cases. But I will add a configuration in gora.properties for >> selecting the other modes and specifying a autoflush time if needed by >> the user. >> >> >>> >>> *> Also, while reviewing the datastore interface I noticed this method >>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this >>> method?, should I use the partition definition in the xml mapping file for >>> this?.* >>> >>> The method getPartitions(Query) is related to Hadoop. Apache Gora >>> integrates with Hadoop implementing a custom Map and Reduce that allows to >>> get/write Entities directly. >>> You can take a look at HBase's implementation [4], which relies >>> o.a.h.hbase.mapreduce.TableInputFormatBase >>> [5] to compute the splits (start key---end key) with the location of the >>> split to create a colection of partitions [6]. >>> >>> So, if Kudu is allowed to perform computation using local kudu splits, >>> then this method does the needed preparation to allow to "send the >>> computation to where the data is locally". >>> >>> In any case, you can see that: >>> >>> - MongoDB store implementation does not implement splitting [7] >>> - Cassandra store implementation does not implement splitting [8] >>> - Aerospike store implementation does not implement splitting [9] >>> - Accumulo store implementation* does* implement splitting [10] >>> >>> If Kudu has a method to get the different splits for a table and its >>> locations, then you will be able to implement the full feature. >>> >>> This is Hadoop related and it is not trivial. I haven't elaborated much, >>> so if you find you need more information let me know :) >>> >>> >>> >> I will check whether Kudu has these features in order to implement this >> method. If not I will use the default implementation found in other >> backends. >> >> >>> About Queries, what I can tell is that Hbase only implements "Start key" >>> + "End key" because it has only 2 operations: "get" and "scan", and the >>> querying is for "scan" operation, were you want an interval (or all) of the >>> rows. Does Kudu have more querying functionality? >>> >>> >> Yes, Kudu implements a Scanner for querying data among with conditional >> predicates for filtering. I am using those classes. >> >> >>> About other topic, I am trying to install Kudu in standalone (all in 1 >>> node). Do you use a Cloudera installation or do you have a standalone >>> installation? How do you do it? I found some instructions, but they talk >>> about compiling Kudu [11]. I was looking for something like HBase, that it >>> is unzip + execute "hbase start". >>> >>> >> I am using an embedded mini-cluster which comes with compiled binaries >> and can be used with maven[1] for testing my code. Once I get it mature >> enough I think I will be testing the datastore with a docker container [2]. >> I could not find a unzip+execute bundle either and I am kinda noob for >> compiling it myself. >> >> [1] >> https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing >> [2] https://hub.docker.com/r/usuresearch/apache-kudu/ >> >> >>> Good job and thank you!! :) >>> >>> Regards, >>> >>> Alfonso Nishikawa >>> >>> >>> [1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types >>> [2] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L175 >>> [3] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L458 >>> [4] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L472 >>> [5] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L479 >>> [6] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L517 >>> [7] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-mongodb/src/main/java/org/apache/gora/mongodb/store/MongoStore.java#L533 >>> [8] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java#L292 >>> [9] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-aerospike/src/main/java/org/apache/gora/aerospike/store/AerospikeStore.java#L369 >>> [10] - >>> https://github.com/apache/gora/blob/apache-gora-0.9/gora-accumulo/src/main/java/org/apache/gora/accumulo/store/AccumuloStore.java#L902 >>> [11] - https://kudu.apache.org/docs/installation.html >>> >>> >>> El lun., 8 jul. 2019 a las 3:42, John Mora (<jhnmora...@gmail.com>) >>> escribió: >>> >>>> Hi all. >>>> >>>> As every week I updated my report in the Wiki[1]. Also, I pushed my >>>> last commits to my branch [2]. Please give it a look if you have time. >>>> >>>> This week, I will be continue working in the Queries implementation, >>>> please reach me out if you have any suggestions. >>>> >>>> Also, while reviewing the datastore interface I noticed this method >>>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this >>>> method?, should I use the partition definition in the xml mapping file for >>>> this?. >>>> >>>> Cheers, >>>> John. >>>> >>>> [1] >>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>> >>>> >>>> El dom., 30 jun. 2019 a las 16:56, John Mora (<jhnmora...@gmail.com>) >>>> escribió: >>>> >>>>> Hi all. >>>>> >>>>> I received my first evaluation from the Google Summer of Code program >>>>> with a positive result. Thanks so much for your support and confidence to >>>>> the project and me. >>>>> >>>>> I updated my report of this week in the Wiki[1]. Also, I pushed my >>>>> last commits to my branch [2]. >>>>> >>>>> This week, I will be reviewing my the serialization/ deserialization >>>>> process in order to identify optimizations specific for Kudu. Because I >>>>> used a generic methods of other backends which probably could be better >>>>> tuned for kudu. Also, I will start working on the Queries implementation. >>>>> >>>>> BTW, I added a question to the wiki about Date types. Please give it a >>>>> look if you have time. >>>>> >>>>> [1] >>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>>> >>>>> Cheers, >>>>> John >>>>> >>>>> El jue., 27 jun. 2019 a las 21:02, John Mora (<jhnmora...@gmail.com>) >>>>> escribió: >>>>> >>>>>> Hi Carlos. >>>>>> >>>>>> Thanks for the reminder. I submitted the form yesterday. :D >>>>>> >>>>>> Best, >>>>>> John. >>>>>> >>>>>> El jue., 27 jun. 2019 a las 17:34, carlos muñoz (< >>>>>> carlosr...@gmail.com>) escribió: >>>>>> >>>>>>> Hi John >>>>>>> >>>>>>> The first Google Summer of Code evaluation is due on June 28th. >>>>>>> Please make sure you submit your Mentors' evaluation on time. >>>>>>> >>>>>>> Regards, >>>>>>> Carlos >>>>>>> >>>>>>> El dom., 23 jun. 2019 a las 18:29, John Mora (<jhnmora...@gmail.com>) >>>>>>> escribió: >>>>>>> >>>>>>>> Hi all. >>>>>>>> >>>>>>>> FYI, I updated my report of this week on the Wiki[1]. Also, I >>>>>>>> pushed my last commits to my branch [2]. >>>>>>>> >>>>>>>> As I mentioned in the reports I would like to know how datastores >>>>>>>> deal with flush(), should it work always manually executed?. >>>>>>>> >>>>>>>> Finally, This week I will be implementing object >>>>>>>> serialization/deserialization in the methods put, get, delete, exists. >>>>>>>> Do >>>>>>>> you have any suggestions on how to proceed with this task?. >>>>>>>> >>>>>>>> Footnote: Thanks for the feedback Carlos, I fixed the problem. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>>>>>> >>>>>>>> Cheers, >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> El lun., 17 jun. 2019 a las 22:58, carlos muñoz (< >>>>>>>> carlosr...@gmail.com>) escribió: >>>>>>>> >>>>>>>>> Hi John >>>>>>>>> >>>>>>>>> Your last changes look good to me. Keep it up. But, I noticed that >>>>>>>>> you have created an Enumeration for datatypes, which is very similar >>>>>>>>> to the >>>>>>>>> kudu-client's [2]. Probably you should replace [1] for [2] in order to >>>>>>>>> avoid code duplication. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/Column.java#L76 >>>>>>>>> [2] https://kudu.apache.org/apidocs/org/apache/kudu/Type.html >>>>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Carlos >>>>>>>>> >>>>>>>>> El sáb., 15 jun. 2019 a las 12:01, John Mora (< >>>>>>>>> jhnmora...@gmail.com>) escribió: >>>>>>>>> >>>>>>>>>> Hi all. >>>>>>>>>> >>>>>>>>>> I updated my report of this week on the Wiki[1]. I noticed that >>>>>>>>>> my code is lacking some javadoc documentation I think I will be >>>>>>>>>> working on >>>>>>>>>> that this week, also I would like to enable and check schema >>>>>>>>>> management >>>>>>>>>> tests (createSchema, existsSchema, etc.). >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> John. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> El mar., 11 jun. 2019 a las 0:11, John Mora (< >>>>>>>>>> jhnmora...@gmail.com>) escribió: >>>>>>>>>> >>>>>>>>>>> Hi Alfonso. >>>>>>>>>>> >>>>>>>>>>> Thanks so much for your feedback. I am working on your comments. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> El lun., 10 jun. 2019 a las 16:11, Alfonso Nishikawa (< >>>>>>>>>>> alfonso.nishik...@gmail.com>) escribió: >>>>>>>>>>> >>>>>>>>>>>> Hi, John. >>>>>>>>>>>> >>>>>>>>>>>> Regarding your questions at the report [1]: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - How to represent partitioning configurations on the >>>>>>>>>>>> mapping file. >>>>>>>>>>>> >>>>>>>>>>>> This was discussed in other emails, isn't it? :) >>>>>>>>>>>> >>>>>>>>>>>> - KuduTestHarness requires the Maven plugin >>>>>>>>>>>> os-maven-plugin, which needs Maven 3.1.1+, is it a problem for >>>>>>>>>>>> Apache Gora? >>>>>>>>>>>> >>>>>>>>>>>> I believe it is not a problem. My Ubuntu comes with 3.6.0, far >>>>>>>>>>>> from 3.1.1, and I assume everyone uses Maven 3 in a quite new >>>>>>>>>>>> version :) >>>>>>>>>>>> >>>>>>>>>>>> [1] - >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> Alfonso Nishikawa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (< >>>>>>>>>>>> alfonso.nishik...@gmail.com>) escribió: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, John. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you! >>>>>>>>>>>>> Things I have seen: >>>>>>>>>>>>> >>>>>>>>>>>>> - The version of a maven dependency [1] should go on the >>>>>>>>>>>>> Dependency Management of the root pom [2]. Same for [3] and from >>>>>>>>>>>>> there, >>>>>>>>>>>>> should not set the version there. >>>>>>>>>>>>> - Set test dependencies' scope to test, at [4] and from there. >>>>>>>>>>>>> - Set the indentation to 2 spaces for the pom [5] >>>>>>>>>>>>> - Missing "t" in "localhost" at [6]. >>>>>>>>>>>>> - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you >>>>>>>>>>>>> will need root permission to run it. The default port for kudu is >>>>>>>>>>>>> 7051, >>>>>>>>>>>>> isn't it? >>>>>>>>>>>>> - I would ask you to add the same functionality to load the >>>>>>>>>>>>> mapping from configuration as in HBase's store [7] in you >>>>>>>>>>>>> KuduStore [8]. >>>>>>>>>>>>> This will have implications on your readMapping at [9], so take a >>>>>>>>>>>>> look at >>>>>>>>>>>>> the one for HBase at [10] >>>>>>>>>>>>> - I know it is in other backends, but avoid RuntimeExceptions >>>>>>>>>>>>> (at least in Java since we have the checked ones) like in [11]. >>>>>>>>>>>>> You can >>>>>>>>>>>>> wrap them in GoraException. An example is [12] >>>>>>>>>>>>> >>>>>>>>>>>>> And nothing more :) >>>>>>>>>>>>> Keep going, good job. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [1] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98 >>>>>>>>>>>>> [2] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890 >>>>>>>>>>>>> [3] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121 >>>>>>>>>>>>> [4] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180 >>>>>>>>>>>>> [5] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml >>>>>>>>>>>>> [6] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18 >>>>>>>>>>>>> [7] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 >>>>>>>>>>>>> [8] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53 >>>>>>>>>>>>> [9] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81 >>>>>>>>>>>>> [10] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822 >>>>>>>>>>>>> [11] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141 >>>>>>>>>>>>> [12] - >>>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268 >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Alfonso Nishikawa >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> El sáb., 8 jun. 2019 a las 20:26, John Mora (< >>>>>>>>>>>>> jhnmora...@gmail.com>) escribió: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi all. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have just updated my weekly reports on Cwiki [1]. This next >>>>>>>>>>>>>> week I think I should be focusing on the create schema operation >>>>>>>>>>>>>> and >>>>>>>>>>>>>> solving the issue of the partitioning configurations in the >>>>>>>>>>>>>> mapping file. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please let me know if you have suggestions, my last commits >>>>>>>>>>>>>> are available here [2] >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> John >>>>>>>>>>>>>> >>>>>>>>>>>>>>