Thanks Aditya, I will send a longer answer shortly but I didn’t see the link to the first version of the proposal can you resend?
------------------------ Chris Mattmann [email protected] -----Original Message----- From: Aditya Dhulipala <[email protected]> Date: Wednesday, February 25, 2015 at 9:54 PM To: Chris Mattmann <[email protected]> Cc: "[email protected]" <[email protected]> Subject: Re: GSoC 2015 >Hi professor, > >Thanks for your support! > >I saw that a very large portion of the code was written by you. I'm >guessing I will be interacting with you a lot on this (assuming my >application for GSoC goes through) > >I've posted a link to a first-version of the project proposal in the >above email. I'd like to get some feedback on it so that I can polish it. > >Currently, I've written about the general overview of the project and a >broad description of the tasks. I'm still trying to get comfortable with >the codebase and try to come up with schedule of work, milestones & >deliverables, etc. Can you have a look at the proposal and let me know >what you think about it? > >Also, can you give me some pointers on how to use OODT on some dataset. I >think I may have the employment dataset from 572 last semester. Can you >give me some ideas on how to use it with OODT just to get a sense of how >OODT works etc.. > >Here's one I'm thinking of:- >I think OODT is useful for storing, indexing structured data - through >the use of metadata files (.met) and indexing based on this to answer >queries. For unstructured data Lucene/Solr is great tool. But the >employment dataset is not completely unstructured nor does it have >consistent structure. Its slightly structured in the sense that we can >guess what fields are there in records, and move forward from there, >right? >So far the only use case I can think of, is using OODT to crawl the >dataset and push it into solr. Then query OODT through the cmd line (like >in the oodt wiki examples) i.e. using solr syntax of sql syntax. > >Is this a valid use case for OODT? I think people would rather just query >solr directly right? is there any reason for OODT to act as an >in-between? > > >Any more ideas, comments, suggestions? > >Thanks! > >-- >Aditya > > > > >adi > > > >On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann ><[email protected]> wrote: > >This sounds fabulous. I will be keen to help. > >------------------------ >Chris Mattmann >[email protected] > > > > >-----Original Message----- >From: Lewis John Mcgibbney <[email protected]> >Reply-To: <[email protected]> >Date: Wednesday, February 25, 2015 at 1:54 PM >To: Aditya Dhulipala <[email protected]> >Cc: "[email protected]" <[email protected]> >Subject: Re: GSoC 2015 > >>I think that you should aim to implement it on all components and we >>should >>be looking to merge to code into OODT (branch) incrementally. >>It is OK that you may not get every component ported to Avro RPC, what is >>impoirtant is that there is an optimistic but realistic GSoC put forward. >>That is what we are looking for. >>Thank you >>LEwis >> >> >>On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]> >>wrote: >> >>> Hi Lewis, >>> >>> Thanks for your reply! >>> >>> Your responses have helped immensely when I'm stuck on something! >>> >>> In the proposal that I was preparing I had listed out all the >>>components >>> that would require schema definitions and then when I checked the OODT >>> patch 658, I realized that a lot of this was done for the Gora project. >>>But >>> your email has clarified that I can use that as a starting point for >>>the >>> Avro project. This is extremely useful >>> >>> And thanks for the rest of the info as well (about ensuring backwards >>> compaitibility, testing, regression testing).. Now I have a much better >>> idea of formulating a proposal (and the project to-dos also). >>> >>> I'll will have it ready ASAP. I will post it to the group by end of >>>today >>> so that I can get more feedback on it >>> >>> I think I should at least be able to define Avro RPC implementations >>>for >>> one of the components of OODT in the GSoC duration, right? >>> Define the schema >>> Implement the services >>> Write unit tests >>> Regression test against XML-RPC >>> >>> Hopefully I should implement it for more than one component, but I'm >>>still >>> no able to estimate the workload. I'll continue reading up on this >>> >>> I'll continue to work on the proposal and keep you updated. >>> >>> Thanks for all the help!.. I think if I start early, then I can spend >>>the >>> summer coding from the begining.. >>> >>> Thanks! >>> >>> -- >>> Aditya >>> >>> >>> adi >>> >>> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney < >>> [email protected]> wrote: >>> >>>> Hi Adi, >>>> >>>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <[email protected]> >>>> wrote: >>>> >>>>> Hi Lewis, >>>>> >>>>> I was going through the path you posted earlier. OODT- 658 >>>>> https://issues.apache.org/jira/browse/OODT-658 >>>>> >>>>> I think this is a substantial part of the project we're currently >>>>> talking about (XML-RPC overhaul). >>>>> >>>> >>>> Substantial may be a wee bit optimistic ;) But yes a significant >>>>portion >>>> of thinking in to the OODT data structures logic has been done. We DO >>>>need >>>> to implement Metadata in exactly the right way without loosing >>>>existing >>>> functionality so please begin to think about that. >>>> >>>> >>>>https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/ap >>>>a >>>>che/oodt/cas/metadata/Metadata.java >>>> >>>> >>>>> My understanding is that this patch was implemented to make Apache >>>>>Gora >>>>> communicate with OODT, so that's why you've implemented the schema >>>>> definitions for all the data structures used by OODT. >>>>> >>>> >>>> Correct >>>> >>>> >>>>> Gora generates some statically typed code from this schema >>>>> >>>> >>>> Using the GoraCompiler >>>> http://gora.apache.org/current/compiler.html, >>>> invoked via CompilerCLI >>>> >>>> >>>> >>>>> and the next step is to implement OODT logic to store the data in >>>>>Gora >>>>> (as opposed to MySQL or Solr) >>>>> >>>> >>>> YES. This will tidy A LOT of the current configuration up. Will also >>>>have >>>> a unified and well documented manner for configuring the mappings and >>>> datastore specific configuration. All of the Gora datastores are >>>>documented >>>> here >>>> http://gora.apache.org/current/index.html >>>> I've been hacking away on documentation for Gora for about a year so >>>>it >>>> is now relatively OK. I hope you find it useful. >>>> >>>> >>>>> >>>>> So from the viewpoint of the project we're talking about i.e. >>>>>Replacing >>>>> XML-RPC with Avro, >>>>> the schema definition part is pretty much done (or almost done? Need >>>>>to >>>>> define it within OODT as well?). >>>>> >>>> >>>> Note, that NONE of the Avro RPC logic is implemented. So it is nowhere >>>> nearly done ;) The core project definition is still to be addressed >>>>and I >>>> am nearly 100% sure that we will have some trciky issues to address >>>> regarding 1) maintaining as close to backwards compatability as >>>>possible 2) >>>> documenting the entire Avro RPC communications within OODT, 3) Hooking >>>>up >>>> all services, 4) Testing the new implementation, 5) regression testing >>>>it >>>> against the existing XML-RPC layer, 6) setting a roadmap fro >>>>deprecation >>>> and eventual removal of the XML-RPC material >>>> >>>> >>>>> The next step would be to define RPC logic for the client server >>>>> communication within OODT itself i.e. within filemgr, workflowmgr >>>>>etc. >>>>> >>>> >>>> Correct, this should make up the majority of your proposal OK. >>>> >>>> >>>>> >>>>> Am I correct in understanding this? >>>>> >>>>> >>>>> Yes and thank you for joining the dots, it is nice to see a student >>>> interpreting and investigating the problem this much prior to the >>>>project >>>> starting. I am really looking forward to this now. >>>> Thanks >>>> LEwis >>>> >>> >>> >> >> >>-- > > >>*Lewis* > > > > > > >
