Hi all I updated my report in the Wiki[1]. Also, I pushed my last commits to my branch [2]. Please give it a look if you have time.
This week, I will be working in the getPartitions and deleteByQuery methods and testing the other tests in the DataStoreTestBase class. Please let me know if you have suggestions. [1] https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports [2] https://github.com/jhnmora000/gora/tree/GORA-485 Best, John. El mié., 10 jul. 2019 a las 16:17, John Mora (<jhnmora...@gmail.com>) escribió: > Hi Alfonso, > > Thanks so much for your time and support for this project. I will work on > your comments. Responses inline :) > > > El mar., 9 jul. 2019 a las 16:38, Alfonso Nishikawa (< > alfonso.nishik...@gmail.com>) escribió: > >> Hi, John. >> >> Sorry for the delay, I am changing work and I have been very busy :( I >> will try to answer your questions :) >> >> *> In the Employee example there is a field called 'dateOfBirth'. I tried >> to map that field with the UNIXTIME_MICROS datatype of Kudu (I intuitively >> assumed this is a date.). However, in the java world the Employee field is >> a Long value and the kudu datatype is a Timestamp. So, I was wondering >> whether I should force the usage of the UNIXTIME_MICROS datatype for this >> field or just use a LONG datatype in Kudu.* >> >> In Avro 1.8 were introduced "Logical Types" so there is a "date" type >> with an underlying "int" [1]. It's the first time I read about because >> until the last version upgrade of Avro this weren't there. I would suggest >> to ignore "dates" and map dateOfBirth as long, since in any case -in avro- >> the value is the unix epoch. After this first approach, a design >> improvement would be great, though :) >> >> - Would be good to have in the mapping a "timestamp" type so KuduStore >> converts between the Entity long field <-> Kudu timestamp storage? >> - Is there any other approach? >> > > I think that Entity long field <-> Kudu timestamp conversion that the best > alternative right now. Because, I would add more compatible datatypes to > the mapping parameters which users can use. And this conversion should not > be dificult to implement in my opinion. Also, the new Date datatype of avro > could be implemented in newer versions because it would need further > analysis in other datastores too. I will work on that. > > >> >> >> *> What is the Gora's policy regarding flush()? * >> *> KuduClient has multiple flushing modes >> <https://kudu.apache.org/apidocs/org/apache/kudu/client/SessionConfiguration.FlushMode.html>and >> also can set time interval >> <https://kudu.apache.org/releases/1.2.0/apidocs/org/apache/kudu/client/KuduSession.html#setFlushInterval-int-> >> for automatic flush.* >> *> Should theses behaviors be configurable using gora.properties file? or >> just use the default configurations.* >> >> What we do in HBase is configure an autoflush option in gora.properties >> [2] which is used when instanced the Table, but at the same time we >> implement the flush() method to force the flush [3]. I would suggest to >> follow that example, but adding the flushing options of Kudu. What flushing >> mode (and time interval if it applies) do you suggest? >> > > Well, IMHO the default flush mode (auto flush sync) will do the job for > most use cases. But I will add a configuration in gora.properties for > selecting the other modes and specifying a autoflush time if needed by > the user. > > >> >> *> Also, while reviewing the datastore interface I noticed this method >> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this >> method?, should I use the partition definition in the xml mapping file for >> this?.* >> >> The method getPartitions(Query) is related to Hadoop. Apache Gora >> integrates with Hadoop implementing a custom Map and Reduce that allows to >> get/write Entities directly. >> You can take a look at HBase's implementation [4], which relies >> o.a.h.hbase.mapreduce.TableInputFormatBase >> [5] to compute the splits (start key---end key) with the location of the >> split to create a colection of partitions [6]. >> >> So, if Kudu is allowed to perform computation using local kudu splits, >> then this method does the needed preparation to allow to "send the >> computation to where the data is locally". >> >> In any case, you can see that: >> >> - MongoDB store implementation does not implement splitting [7] >> - Cassandra store implementation does not implement splitting [8] >> - Aerospike store implementation does not implement splitting [9] >> - Accumulo store implementation* does* implement splitting [10] >> >> If Kudu has a method to get the different splits for a table and its >> locations, then you will be able to implement the full feature. >> >> This is Hadoop related and it is not trivial. I haven't elaborated much, >> so if you find you need more information let me know :) >> >> >> > I will check whether Kudu has these features in order to implement this > method. If not I will use the default implementation found in other > backends. > > >> About Queries, what I can tell is that Hbase only implements "Start key" >> + "End key" because it has only 2 operations: "get" and "scan", and the >> querying is for "scan" operation, were you want an interval (or all) of the >> rows. Does Kudu have more querying functionality? >> >> > Yes, Kudu implements a Scanner for querying data among with conditional > predicates for filtering. I am using those classes. > > >> About other topic, I am trying to install Kudu in standalone (all in 1 >> node). Do you use a Cloudera installation or do you have a standalone >> installation? How do you do it? I found some instructions, but they talk >> about compiling Kudu [11]. I was looking for something like HBase, that it >> is unzip + execute "hbase start". >> >> > I am using an embedded mini-cluster which comes with compiled binaries and > can be used with maven[1] for testing my code. Once I get it mature enough > I think I will be testing the datastore with a docker container [2]. I > could not find a unzip+execute bundle either and I am kinda noob for > compiling it myself. > > [1] > https://kudu.apache.org/docs/developing.html#_jvm_based_integration_testing > [2] https://hub.docker.com/r/usuresearch/apache-kudu/ > > >> Good job and thank you!! :) >> >> Regards, >> >> Alfonso Nishikawa >> >> >> [1] - https://avro.apache.org/docs/1.8.0/spec.html#Logical+Types >> [2] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L175 >> [3] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L458 >> [4] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L472 >> [5] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L479 >> [6] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L517 >> [7] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-mongodb/src/main/java/org/apache/gora/mongodb/store/MongoStore.java#L533 >> [8] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java#L292 >> [9] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-aerospike/src/main/java/org/apache/gora/aerospike/store/AerospikeStore.java#L369 >> [10] - >> https://github.com/apache/gora/blob/apache-gora-0.9/gora-accumulo/src/main/java/org/apache/gora/accumulo/store/AccumuloStore.java#L902 >> [11] - https://kudu.apache.org/docs/installation.html >> >> >> El lun., 8 jul. 2019 a las 3:42, John Mora (<jhnmora...@gmail.com>) >> escribió: >> >>> Hi all. >>> >>> As every week I updated my report in the Wiki[1]. Also, I pushed my last >>> commits to my branch [2]. Please give it a look if you have time. >>> >>> This week, I will be continue working in the Queries implementation, >>> please reach me out if you have any suggestions. >>> >>> Also, while reviewing the datastore interface I noticed this method >>> 'getPartitions(Query<K, T> query)'. What is the expected behavior of this >>> method?, should I use the partition definition in the xml mapping file for >>> this?. >>> >>> Cheers, >>> John. >>> >>> [1] >>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>> >>> >>> El dom., 30 jun. 2019 a las 16:56, John Mora (<jhnmora...@gmail.com>) >>> escribió: >>> >>>> Hi all. >>>> >>>> I received my first evaluation from the Google Summer of Code program >>>> with a positive result. Thanks so much for your support and confidence to >>>> the project and me. >>>> >>>> I updated my report of this week in the Wiki[1]. Also, I pushed my last >>>> commits to my branch [2]. >>>> >>>> This week, I will be reviewing my the serialization/ deserialization >>>> process in order to identify optimizations specific for Kudu. Because I >>>> used a generic methods of other backends which probably could be better >>>> tuned for kudu. Also, I will start working on the Queries implementation. >>>> >>>> BTW, I added a question to the wiki about Date types. Please give it a >>>> look if you have time. >>>> >>>> [1] >>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>> >>>> Cheers, >>>> John >>>> >>>> El jue., 27 jun. 2019 a las 21:02, John Mora (<jhnmora...@gmail.com>) >>>> escribió: >>>> >>>>> Hi Carlos. >>>>> >>>>> Thanks for the reminder. I submitted the form yesterday. :D >>>>> >>>>> Best, >>>>> John. >>>>> >>>>> El jue., 27 jun. 2019 a las 17:34, carlos muñoz (<carlosr...@gmail.com>) >>>>> escribió: >>>>> >>>>>> Hi John >>>>>> >>>>>> The first Google Summer of Code evaluation is due on June 28th. >>>>>> Please make sure you submit your Mentors' evaluation on time. >>>>>> >>>>>> Regards, >>>>>> Carlos >>>>>> >>>>>> El dom., 23 jun. 2019 a las 18:29, John Mora (<jhnmora...@gmail.com>) >>>>>> escribió: >>>>>> >>>>>>> Hi all. >>>>>>> >>>>>>> FYI, I updated my report of this week on the Wiki[1]. Also, I pushed >>>>>>> my last commits to my branch [2]. >>>>>>> >>>>>>> As I mentioned in the reports I would like to know how datastores >>>>>>> deal with flush(), should it work always manually executed?. >>>>>>> >>>>>>> Finally, This week I will be implementing object >>>>>>> serialization/deserialization in the methods put, get, delete, exists. >>>>>>> Do >>>>>>> you have any suggestions on how to proceed with this task?. >>>>>>> >>>>>>> Footnote: Thanks for the feedback Carlos, I fixed the problem. >>>>>>> >>>>>>> [1] >>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>>>>> >>>>>>> Cheers, >>>>>>> John >>>>>>> >>>>>>> >>>>>>> El lun., 17 jun. 2019 a las 22:58, carlos muñoz (< >>>>>>> carlosr...@gmail.com>) escribió: >>>>>>> >>>>>>>> Hi John >>>>>>>> >>>>>>>> Your last changes look good to me. Keep it up. But, I noticed that >>>>>>>> you have created an Enumeration for datatypes, which is very similar >>>>>>>> to the >>>>>>>> kudu-client's [2]. Probably you should replace [1] for [2] in order to >>>>>>>> avoid code duplication. >>>>>>>> >>>>>>>> [1] >>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/Column.java#L76 >>>>>>>> [2] https://kudu.apache.org/apidocs/org/apache/kudu/Type.html >>>>>>>> >>>>>>>> >>>>>>>> Best, >>>>>>>> Carlos >>>>>>>> >>>>>>>> El sáb., 15 jun. 2019 a las 12:01, John Mora (<jhnmora...@gmail.com>) >>>>>>>> escribió: >>>>>>>> >>>>>>>>> Hi all. >>>>>>>>> >>>>>>>>> I updated my report of this week on the Wiki[1]. I noticed that my >>>>>>>>> code is lacking some javadoc documentation I think I will be working >>>>>>>>> on >>>>>>>>> that this week, also I would like to enable and check schema >>>>>>>>> management >>>>>>>>> tests (createSchema, existsSchema, etc.). >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> John. >>>>>>>>> >>>>>>>>> >>>>>>>>> El mar., 11 jun. 2019 a las 0:11, John Mora (<jhnmora...@gmail.com>) >>>>>>>>> escribió: >>>>>>>>> >>>>>>>>>> Hi Alfonso. >>>>>>>>>> >>>>>>>>>> Thanks so much for your feedback. I am working on your comments. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> El lun., 10 jun. 2019 a las 16:11, Alfonso Nishikawa (< >>>>>>>>>> alfonso.nishik...@gmail.com>) escribió: >>>>>>>>>> >>>>>>>>>>> Hi, John. >>>>>>>>>>> >>>>>>>>>>> Regarding your questions at the report [1]: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> - How to represent partitioning configurations on the >>>>>>>>>>> mapping file. >>>>>>>>>>> >>>>>>>>>>> This was discussed in other emails, isn't it? :) >>>>>>>>>>> >>>>>>>>>>> - KuduTestHarness requires the Maven plugin os-maven-plugin, >>>>>>>>>>> which needs Maven 3.1.1+, is it a problem for Apache Gora? >>>>>>>>>>> >>>>>>>>>>> I believe it is not a problem. My Ubuntu comes with 3.6.0, far >>>>>>>>>>> from 3.1.1, and I assume everyone uses Maven 3 in a quite new >>>>>>>>>>> version :) >>>>>>>>>>> >>>>>>>>>>> [1] - >>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> >>>>>>>>>>> Alfonso Nishikawa >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> El lun., 10 jun. 2019 a las 21:07, Alfonso Nishikawa (< >>>>>>>>>>> alfonso.nishik...@gmail.com>) escribió: >>>>>>>>>>> >>>>>>>>>>>> Hi, John. >>>>>>>>>>>> >>>>>>>>>>>> Thank you! >>>>>>>>>>>> Things I have seen: >>>>>>>>>>>> >>>>>>>>>>>> - The version of a maven dependency [1] should go on the >>>>>>>>>>>> Dependency Management of the root pom [2]. Same for [3] and from >>>>>>>>>>>> there, >>>>>>>>>>>> should not set the version there. >>>>>>>>>>>> - Set test dependencies' scope to test, at [4] and from there. >>>>>>>>>>>> - Set the indentation to 2 spaces for the pom [5] >>>>>>>>>>>> - Missing "t" in "localhost" at [6]. >>>>>>>>>>>> - Port 13 for Kudu? That is "Daytime Protocol" RFC 867 and you >>>>>>>>>>>> will need root permission to run it. The default port for kudu is >>>>>>>>>>>> 7051, >>>>>>>>>>>> isn't it? >>>>>>>>>>>> - I would ask you to add the same functionality to load the >>>>>>>>>>>> mapping from configuration as in HBase's store [7] in you >>>>>>>>>>>> KuduStore [8]. >>>>>>>>>>>> This will have implications on your readMapping at [9], so take a >>>>>>>>>>>> look at >>>>>>>>>>>> the one for HBase at [10] >>>>>>>>>>>> - I know it is in other backends, but avoid RuntimeExceptions >>>>>>>>>>>> (at least in Java since we have the checked ones) like in [11]. >>>>>>>>>>>> You can >>>>>>>>>>>> wrap them in GoraException. An example is [12] >>>>>>>>>>>> >>>>>>>>>>>> And nothing more :) >>>>>>>>>>>> Keep going, good job. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L98 >>>>>>>>>>>> [2] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/pom.xml#L890 >>>>>>>>>>>> [3] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L121 >>>>>>>>>>>> [4] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml#L180 >>>>>>>>>>>> [5] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/pom.xml >>>>>>>>>>>> [6] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/test/resources/gora.properties#L18 >>>>>>>>>>>> [7] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L92 >>>>>>>>>>>> [8] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/store/KuduStore.java#L53 >>>>>>>>>>>> [9] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L81 >>>>>>>>>>>> [10] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L822 >>>>>>>>>>>> [11] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/GORA-485/gora-kudu/src/main/java/org/apache/gora/kudu/mapping/KuduMappingBuilder.java#L141 >>>>>>>>>>>> [12] - >>>>>>>>>>>> https://github.com/jhnmora000/gora/blob/master/gora-hbase/src/main/java/org/apache/gora/hbase/store/HBaseStore.java#L268 >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> Alfonso Nishikawa >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> El sáb., 8 jun. 2019 a las 20:26, John Mora (< >>>>>>>>>>>> jhnmora...@gmail.com>) escribió: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all. >>>>>>>>>>>>> >>>>>>>>>>>>> I have just updated my weekly reports on Cwiki [1]. This next >>>>>>>>>>>>> week I think I should be focusing on the create schema operation >>>>>>>>>>>>> and >>>>>>>>>>>>> solving the issue of the partitioning configurations in the >>>>>>>>>>>>> mapping file. >>>>>>>>>>>>> >>>>>>>>>>>>> Please let me know if you have suggestions, my last commits >>>>>>>>>>>>> are available here [2] >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/GORA/GORA-485+Apache+Kudu+datastore+for+Gora+Reports >>>>>>>>>>>>> [2] https://github.com/jhnmora000/gora/tree/GORA-485 >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> John >>>>>>>>>>>>> >>>>>>>>>>>>>