​Hi Lewis, Ok. I'll check it out. Thanks!​
​-- adi On Thu, Feb 26, 2015 at 10:32 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Adi, > Please see DRAT for a flagship application which displays OODT > https://github.com/chrismattmann/drat > > On Wed, Feb 25, 2015 at 9:54 PM, Aditya Dhulipala <[email protected]> > wrote: > > > Hi professor, > > > > Thanks for your support! > > > > I saw that a very large portion of the code was written by you. I'm > > guessing I will be interacting with you a lot on this (assuming my > > application for GSoC goes through) > > > > I've posted a link to a first-version of the project proposal in the > above > > email. I'd like to get some feedback on it so that I can polish it. > > > > Currently, I've written about the general overview of the project and a > > broad description of the tasks. I'm still trying to get comfortable with > > the codebase and try to come up with schedule of work, milestones & > > deliverables, etc. Can you have a look at the proposal and let me know > what > > you think about it? > > > > Also, can you give me some pointers on how to use OODT on some dataset. I > > think I may have the employment dataset from 572 last semester. Can you > > give me some ideas on how to use it with OODT just to get a sense of how > > OODT works etc.. > > > > Here's one I'm thinking of:- > > I think OODT is useful for storing, indexing structured data - through > the > > use of metadata files (.met) and indexing based on this to answer > queries. > > For unstructured data Lucene/Solr is great tool. But the employment > dataset > > is not completely unstructured nor does it have consistent structure. Its > > slightly structured in the sense that we can guess what fields are there > in > > records, and move forward from there, right? > > So far the only use case I can think of, is using OODT to crawl the > dataset > > and push it into solr. Then query OODT through the cmd line (like in the > > oodt wiki examples) i.e. using solr syntax of sql syntax. > > > > Is this a valid use case for OODT? I think people would rather just query > > solr directly right? is there any reason for OODT to act as an > in-between? > > > > Any more ideas, comments, suggestions? > > > > Thanks! > > > > -- > > Aditya > > > > > > adi > > > > On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann < > [email protected]> > > wrote: > > > > > This sounds fabulous. I will be keen to help. > > > > > > ------------------------ > > > Chris Mattmann > > > [email protected] > > > > > > > > > > > > > > > -----Original Message----- > > > From: Lewis John Mcgibbney <[email protected]> > > > Reply-To: <[email protected]> > > > Date: Wednesday, February 25, 2015 at 1:54 PM > > > To: Aditya Dhulipala <[email protected]> > > > Cc: "[email protected]" <[email protected]> > > > Subject: Re: GSoC 2015 > > > > > > >I think that you should aim to implement it on all components and we > > > >should > > > >be looking to merge to code into OODT (branch) incrementally. > > > >It is OK that you may not get every component ported to Avro RPC, what > > is > > > >impoirtant is that there is an optimistic but realistic GSoC put > > forward. > > > >That is what we are looking for. > > > >Thank you > > > >LEwis > > > > > > > > > > > >On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]> > > > >wrote: > > > > > > > >> Hi Lewis, > > > >> > > > >> Thanks for your reply! > > > >> > > > >> Your responses have helped immensely when I'm stuck on something! > > > >> > > > >> In the proposal that I was preparing I had listed out all the > > components > > > >> that would require schema definitions and then when I checked the > OODT > > > >> patch 658, I realized that a lot of this was done for the Gora > > project. > > > >>But > > > >> your email has clarified that I can use that as a starting point for > > the > > > >> Avro project. This is extremely useful > > > >> > > > >> And thanks for the rest of the info as well (about ensuring > backwards > > > >> compaitibility, testing, regression testing).. Now I have a much > > better > > > >> idea of formulating a proposal (and the project to-dos also). > > > >> > > > >> I'll will have it ready ASAP. I will post it to the group by end of > > > >>today > > > >> so that I can get more feedback on it > > > >> > > > >> I think I should at least be able to define Avro RPC implementations > > for > > > >> one of the components of OODT in the GSoC duration, right? > > > >> Define the schema > > > >> Implement the services > > > >> Write unit tests > > > >> Regression test against XML-RPC > > > >> > > > >> Hopefully I should implement it for more than one component, but I'm > > > >>still > > > >> no able to estimate the workload. I'll continue reading up on this > > > >> > > > >> I'll continue to work on the proposal and keep you updated. > > > >> > > > >> Thanks for all the help!.. I think if I start early, then I can > spend > > > >>the > > > >> summer coding from the begining.. > > > >> > > > >> Thanks! > > > >> > > > >> -- > > > >> Aditya > > > >> > > > >> > > > >> adi > > > >> > > > >> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney < > > > >> [email protected]> wrote: > > > >> > > > >>> Hi Adi, > > > >>> > > > >>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala < > [email protected] > > > > > > >>> wrote: > > > >>> > > > >>>> Hi Lewis, > > > >>>> > > > >>>> I was going through the path you posted earlier. OODT- 658 > > > >>>> https://issues.apache.org/jira/browse/OODT-658 > > > >>>> > > > >>>> I think this is a substantial part of the project we're currently > > > >>>> talking about (XML-RPC overhaul). > > > >>>> > > > >>> > > > >>> Substantial may be a wee bit optimistic ;) But yes a significant > > > >>>portion > > > >>> of thinking in to the OODT data structures logic has been done. We > DO > > > >>>need > > > >>> to implement Metadata in exactly the right way without loosing > > existing > > > >>> functionality so please begin to think about that. > > > >>> > > > >>> > > > >>> > > > > https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/apa > > > >>>che/oodt/cas/metadata/Metadata.java > > > >>> > > > >>> > > > >>>> My understanding is that this patch was implemented to make Apache > > > >>>>Gora > > > >>>> communicate with OODT, so that's why you've implemented the schema > > > >>>> definitions for all the data structures used by OODT. > > > >>>> > > > >>> > > > >>> Correct > > > >>> > > > >>> > > > >>>> Gora generates some statically typed code from this schema > > > >>>> > > > >>> > > > >>> Using the GoraCompiler > > > >>> http://gora.apache.org/current/compiler.html, > > > >>> invoked via CompilerCLI > > > >>> > > > >>> > > > >>> > > > >>>> and the next step is to implement OODT logic to store the data in > > Gora > > > >>>> (as opposed to MySQL or Solr) > > > >>>> > > > >>> > > > >>> YES. This will tidy A LOT of the current configuration up. Will > also > > > >>>have > > > >>> a unified and well documented manner for configuring the mappings > and > > > >>> datastore specific configuration. All of the Gora datastores are > > > >>>documented > > > >>> here > > > >>> http://gora.apache.org/current/index.html > > > >>> I've been hacking away on documentation for Gora for about a year > so > > it > > > >>> is now relatively OK. I hope you find it useful. > > > >>> > > > >>> > > > >>>> > > > >>>> So from the viewpoint of the project we're talking about i.e. > > > >>>>Replacing > > > >>>> XML-RPC with Avro, > > > >>>> the schema definition part is pretty much done (or almost done? > Need > > > >>>>to > > > >>>> define it within OODT as well?). > > > >>>> > > > >>> > > > >>> Note, that NONE of the Avro RPC logic is implemented. So it is > > nowhere > > > >>> nearly done ;) The core project definition is still to be addressed > > > >>>and I > > > >>> am nearly 100% sure that we will have some trciky issues to address > > > >>> regarding 1) maintaining as close to backwards compatability as > > > >>>possible 2) > > > >>> documenting the entire Avro RPC communications within OODT, 3) > > Hooking > > > >>>up > > > >>> all services, 4) Testing the new implementation, 5) regression > > testing > > > >>>it > > > >>> against the existing XML-RPC layer, 6) setting a roadmap fro > > > >>>deprecation > > > >>> and eventual removal of the XML-RPC material > > > >>> > > > >>> > > > >>>> The next step would be to define RPC logic for the client server > > > >>>> communication within OODT itself i.e. within filemgr, workflowmgr > > etc. > > > >>>> > > > >>> > > > >>> Correct, this should make up the majority of your proposal OK. > > > >>> > > > >>> > > > >>>> > > > >>>> Am I correct in understanding this? > > > >>>> > > > >>>> > > > >>>> Yes and thank you for joining the dots, it is nice to see a > student > > > >>> interpreting and investigating the problem this much prior to the > > > >>>project > > > >>> starting. I am really looking forward to this now. > > > >>> Thanks > > > >>> LEwis > > > >>> > > > >> > > > >> > > > > > > > > > > > >-- > > > >*Lewis* > > > > > > > > > > > > > > > -- > *Lewis* >
