Re: GSoC 2015

Chris Mattmann Wed, 25 Feb 2015 22:26:05 -0800

Thanks Aditya, I will send a longer
answer shortly but I didn’t see the link to the
first version of the proposal can you resend?


------------------------
Chris Mattmann
[email protected]




-----Original Message-----
From: Aditya Dhulipala <[email protected]>
Date: Wednesday, February 25, 2015 at 9:54 PM
To: Chris Mattmann <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: GSoC 2015

>Hi professor,
>
>Thanks for your support!
>
>I saw that a very large portion of the code was written by you. I'm
>guessing I will be interacting with you a lot on this (assuming my
>application for GSoC goes through)
>
>I've posted a link to a first-version of the project proposal in the
>above email. I'd like to get some feedback on it so that I can polish it.
>
>Currently, I've written about the general overview of the project and a
>broad description of the tasks. I'm still trying to get comfortable with
>the codebase and try to come up with schedule of work, milestones &
>deliverables, etc. Can you have a look at the proposal and let me know
>what you think about it?
>
>Also, can you give me some pointers on how to use OODT on some dataset. I
>think I may have the employment dataset from 572 last semester. Can you
>give me some ideas on how to use it with OODT just to get a sense of how
>OODT works etc..
>
>Here's one I'm thinking of:-
>I think OODT is useful for storing, indexing structured data - through
>the use of metadata files (.met) and indexing based on this to answer
>queries. For unstructured data Lucene/Solr is great tool. But the
>employment dataset is not completely unstructured nor does it have
>consistent structure. Its slightly structured in the sense that we can
>guess what fields are there in records, and move forward from there,
>right?
>So far the only use case I can think of, is using OODT to crawl the
>dataset and push it into solr. Then query OODT through the cmd line (like
>in the oodt wiki examples) i.e. using solr syntax of sql syntax.
>
>Is this a valid use case for OODT? I think people would rather just query
>solr directly right? is there any reason for OODT to act as an
>in-between? 
>
>
>Any more ideas, comments, suggestions?
>
>Thanks!
>
>--
>Aditya
>
>
>
>
>adi
>
>
>
>On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann
><[email protected]> wrote:
>
>This sounds fabulous. I will be keen to help.
>
>------------------------
>Chris Mattmann
>[email protected]
>
>
>
>
>-----Original Message-----
>From: Lewis John Mcgibbney <[email protected]>
>Reply-To: <[email protected]>
>Date: Wednesday, February 25, 2015 at 1:54 PM
>To: Aditya Dhulipala <[email protected]>
>Cc: "[email protected]" <[email protected]>
>Subject: Re: GSoC 2015
>
>>I think that you should aim to implement it on all components and we
>>should
>>be looking to merge to code into OODT (branch) incrementally.
>>It is OK that you may not get every component ported to Avro RPC, what is
>>impoirtant is that there is an optimistic but realistic GSoC put forward.
>>That is what we are looking for.
>>Thank you
>>LEwis
>>
>>
>>On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]>
>>wrote:
>>
>>> Hi Lewis,
>>>
>>> Thanks for your reply!
>>>
>>> Your responses have helped immensely when I'm stuck on something!
>>>
>>> In the proposal that I was preparing I had listed out all the
>>>components
>>> that would require schema definitions and then when I checked the OODT
>>> patch 658, I realized that a lot of this was done for the Gora project.
>>>But
>>> your email has clarified that I can use that as a starting point for
>>>the
>>> Avro project. This is extremely useful
>>>
>>> And thanks for the rest of the info as well (about ensuring backwards
>>> compaitibility, testing, regression testing).. Now I have a much better
>>> idea of formulating a proposal (and the project to-dos also).
>>>
>>> I'll will have it ready ASAP. I will post it to the group by end of
>>>today
>>> so that I can get more feedback on it
>>>
>>> I think I should at least be able to define Avro RPC implementations
>>>for
>>> one of the components of OODT in the GSoC duration, right?
>>> Define the schema
>>> Implement the services
>>> Write unit tests
>>> Regression test against XML-RPC
>>>
>>> Hopefully I should implement it for more than one component, but I'm
>>>still
>>> no able to estimate the workload. I'll continue reading up on this
>>>
>>> I'll continue to work on the proposal and keep you updated.
>>>
>>> Thanks for all the help!.. I think if I start early, then I can spend
>>>the
>>> summer coding from the begining..
>>>
>>> Thanks!
>>>
>>> --
>>> Aditya
>>>
>>>
>>> adi
>>>
>>> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney <
>>> [email protected]> wrote:
>>>
>>>> Hi Adi,
>>>>
>>>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Lewis,
>>>>>
>>>>> I was going through the path you posted earlier. OODT- 658
>>>>> https://issues.apache.org/jira/browse/OODT-658
>>>>>
>>>>> I think this is a substantial part of the project we're currently
>>>>> talking about (XML-RPC overhaul).
>>>>>
>>>>
>>>> Substantial may be a wee bit optimistic ;) But yes a significant
>>>>portion
>>>> of thinking in to the OODT data structures logic has been done. We DO
>>>>need
>>>> to implement Metadata in exactly the right way without loosing
>>>>existing
>>>> functionality so please begin to think about that.
>>>>
>>>>
>>>>https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/ap
>>>>a
>>>>che/oodt/cas/metadata/Metadata.java
>>>>
>>>>
>>>>> My understanding is that this patch was implemented to make Apache
>>>>>Gora
>>>>> communicate with OODT, so that's why you've implemented the schema
>>>>> definitions for all the data structures used by OODT.
>>>>>
>>>>
>>>> Correct
>>>>
>>>>
>>>>> Gora generates some statically typed code from this schema
>>>>>
>>>>
>>>> Using the GoraCompiler
>>>> http://gora.apache.org/current/compiler.html,
>>>> invoked via CompilerCLI
>>>>
>>>>
>>>>
>>>>> and the next step is to implement OODT logic to store the data in
>>>>>Gora
>>>>> (as opposed to MySQL or Solr)
>>>>>
>>>>
>>>> YES. This will tidy A LOT of the current configuration up. Will also
>>>>have
>>>> a unified and well documented manner for configuring the mappings and
>>>> datastore specific configuration. All of the Gora datastores are
>>>>documented
>>>> here
>>>> http://gora.apache.org/current/index.html
>>>> I've been hacking away on documentation for Gora for about a year so
>>>>it
>>>> is now relatively OK. I hope you find it useful.
>>>>
>>>>
>>>>>
>>>>> So from the viewpoint of the project we're talking about i.e.
>>>>>Replacing
>>>>> XML-RPC with Avro,
>>>>> the schema definition part is pretty much done (or almost done? Need
>>>>>to
>>>>> define it within OODT as well?).
>>>>>
>>>>
>>>> Note, that NONE of the Avro RPC logic is implemented. So it is nowhere
>>>> nearly done ;) The core project definition is still to be addressed
>>>>and I
>>>> am nearly 100% sure that we will have some trciky issues to address
>>>> regarding 1) maintaining as close to backwards compatability as
>>>>possible 2)
>>>> documenting the entire Avro RPC communications within OODT, 3) Hooking
>>>>up
>>>> all services, 4) Testing the new implementation, 5) regression testing
>>>>it
>>>> against the existing XML-RPC layer, 6) setting a roadmap fro
>>>>deprecation
>>>> and eventual removal of the XML-RPC material
>>>>
>>>>
>>>>> The next step would be to define RPC logic for the client server
>>>>> communication within OODT itself i.e. within filemgr, workflowmgr
>>>>>etc.
>>>>>
>>>>
>>>> Correct, this should make up the majority of your proposal OK.
>>>>
>>>>
>>>>>
>>>>> Am I correct in understanding this?
>>>>>
>>>>>
>>>>> Yes and thank you for joining the dots, it is nice to see a student
>>>> interpreting and investigating the problem this much prior to the
>>>>project
>>>> starting. I am really looking forward to this now.
>>>> Thanks
>>>> LEwis
>>>>
>>>
>>>
>>
>>
>>--
>
>
>>*Lewis*
>
>
>
>
>
>
>

Re: GSoC 2015

Reply via email to