Good Morning,
Sounds good to me.
Please make sure to read through Martins commentary over the years it is
very comprehensive.
I'll look forward to seeing your proposal soon.
Thank you
Lewis

On Monday, February 23, 2015, Aditya Dhulipala <[email protected]> wrote:

> Hi Lewis,
>
> No problem for the delay. Thanks for your reply!
>
> About the choosing avro over protobuf/thrift --  Ok. I understand. That
> (using a well founded apache project) makes sense over something deployed
> by another org. Also, the one thing Avro has --in terms of schema
> definition being part of the message-- also seems more advantageous over
> any of the other protocols.
>
> I understand your arguments against XML. By moving to Avro, we're not only
> eliminating difficulties in XML parsing etc, we're also getting schema
> definitions as part of the client-server exchange instead of having to
> generate XSD (supposing the existing OODT impl had that feature as well).
> So this is doubly advantageous i.e. we move to JSON (and lighter to parse)
> and also get XSD-type schema definitions.
> Please correct me if I'm wrong.
>
> Yes. This project sounds more and more exciting the more I learn about it!
> Plus, the impact, as you say, it would have is also motivating to take it
> up :)
>
> I'll begin to work on the proposal. Or at least a first version. I will
> have a draft ready by Wednesday.
>
> I'll continue to look into the code. Probably look into some more avro
> specifc stuff as well.
>
> About picking up OODT issues - -Understood. I'll do that as well
>
> Thanks for all the help!
>
> best
> --
> aditya
>
>
> adi
>
> On Sun, Feb 22, 2015 at 8:17 PM, Lewis John Mcgibbney <
> [email protected] <javascript:;>> wrote:
>
> > Hi Aditya,
> > Apologies for delay on this one :(
> > Thank you for your patience. Please see my inline responses.
> >
> > On Tue, Feb 17, 2015 at 12:31 AM, Aditya Dhulipala <[email protected]
> <javascript:;>>
> > wrote:
> >
> > > Hi Lewis,
> > >
> > > I've been reading up on the doc you provided earlier.
> > >
> >
> > Great
> >
> >
> > >
> > > I've made some progress. I've looked into the filemgr component and
> run a
> > > few commands to ingest files etc. I understand how it works now.
> > >
> >
> > Great
> >
> >
> > >
> > > About the potential workflow -- (This is just my initial
> understanding. I
> > > could be wrong about this, please correct me)
> > > I think I have to rewrite the entire component to conform to the avro
> > style
> > > specification. So this means, I need to define the scheme for all the
> > files
> > > inside filemanger/structs -- Product.java, ProductPage.java etc.
> > >
> >
> > Yes, this is correct. The main data struxtures are documented in Avro
> > specification format as per the patch I attached to OODT-685
> > https://issues.apache.org/jira/browse/OODT-658
> > Please check them out.
> > There is an issues here as the DataStrutures in filemgr are dependent
> upon
> > additional data structures, namely Metadata which is contained within the
> > OODT metadata package.
> >
> >
> > >
> > > I should define the schema for each of these similar to that specified
> > for
> > > "User" on this link -
> > >
> > >
> >
> http://avro.apache.org/docs/current/gettingstartedjava.html#Defining+a+schema
> > >
> >
> > Absolutely correct. Please see OODT-685
> >
> >
> > >
> > > Currently I think this piece of code (Product.java) constructs an xml
> > file
> > > for each product and so that the rpcClient can send it over the xml-rcp
> > > interface to the filemgr server.
> >
> >
> > Yes
> >
> >
> > > This project aims to redefine this process
> > > to send the data as a binary encoding (for smaller size, and thus
> smaller
> > > latency) by using the avro protocol.
> > >
> >
> > Yes this is correct. It reduces wire transfer as well as a more flexible
> > model for reading data which has been written by a particular writer.
> Avro
> > support schema evolution as well meaning that data does not need to be
> > static i nature if we consider it from the Avro point of view. This is
> > highly advantageous from a data archival and interoperability view.
> >
> >
> > >
> > > And then I should invoke the avro code generation tools from within
> > > org.apache...system.XmlRpcFileManagerClient (probably have to rewrite
> > this
> > > module to fit Avro client specification as well)
> > >
> >
> > ... probably yes. I would imagine that by the time this project is
> > finished, there will be absolutely no references to XML anywhere. It will
> > be entirely replaces by Avro Schema's (JSON)
> >
> >
> > >
> > > I should also make the XmlRpcFileManger (server) fit to the avro
> specific
> > > implementation of the server interface.
> > >
> >
> > Yes that is correct.
> >
> >
> > >
> > > I think this has to be repeated for all the components within oodt
> > > (workflow manager etc)
> > >
> >
> > Absolutely. All key services e.g FileMgr, Workflow and Resource.
> >
> >
> > >
> > > I also have some questions:-
> > >
> > > 1. Is there any specific reason for picking Avro over Thrift or
> Protocol
> > > Buffers?
> > >
> >
> > Please read upon some of Martin Kleppmann's blogs and commentary over the
> > years on this topic
> >
> >
> http://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
> > He did a bunch of work on Avro whilst @LinkedIn and it will really help
> you
> > to read through some of his work.
> >
> >
> > > 2. I also came across this answer on quora on Avro vs. XML-RPC
> > >
> > >
> >
> http://www.quora.com/What-merits-does-Avro-RPC-have-over-XML-RPC/answer/Ted-Dunning-1?__snids__=959769040&__nsrc__=1&__filter__=all
> > >
> > > The author talks about another binary format - Simple Binary Encoding.
> > And
> > > recommends using protocol buffers for their wide use and documentation.
> > Can
> > > you share your thoughts about this?
> > >
> >
> > I can yes.
> >  - Protocol Buffers is described as Google's Interchange format. Does
> this
> > not sound a bit limiting? What happens if you want to change some of the
> > code to fit into OODT. Are you going to fork the project and maintain
> your
> > own Protocol Buffers implementation.
> >  - @Apache there is a saying EAT YOUR OWN DOG FOOD. I would much rather
> we
> > implement a well founded Apache project e.g. Avro over Protocl Buffers
> any
> > day of the week.
> > Avro is also widely used. It also has a pretty excellent specification
> > document which as you've already seen has enabled you to understand
> schema
> > design.
> > ...
> >
> >
> > >
> > > I'd also like to run some more examples of the filemgr client/server.
> > That
> > > way I can run some commands like these
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/OODT/Exploring+the+OODT+File+Manager+XML-RPC+Interface
> > > and understand the overhead caused by xml-rpc or get a sense of what
> the
> > > latency of using xml-rcp is.
> >
> >
> > My main justification for moving towards a replacement for XML-RPC in
> OODT
> > is multi-faceted
> >  -  the library is dated,
> >  - the plethora of XML in OODT is cumbersome,
> >  - none of the XML is accompanied by XSD
> >  - Avro has advanced significantly over the years and I am more familiar
> > with it than I am other data serialization frameworks out there. It
> defines
> > the Protocol layer which is a natural replacement for the XML-RPC
> >  - the Google Summer of Code project we are describing here is carving
> the
> > way for a complete Avro-RPC powered REST API for each OODT service. This
> is
> > a HUGE game changer for invoking remote OODT services.
> >
> >
> > > Can you also share examples of filemgr servers
> > > running in the real-world that I could query or use?
> > >
> >
> > Most of the servers I am aware that are running are on VPN's and
> internal,
> > secure networks so the short answer is no.
> > This is something which we we get established once you were brought on as
> > the GSoC student for this project I would think.
> >
> >
> > >
> > > Any other comments/suggestions are welcome! :)
> > >
> > >
> > I would state that it would be really nice for you to put some of this
> > correspondence down to a proposal of sorts. You will require a working
> > proposal when you apply to Google.
> > Also, please feel free, if you have time, to pick up some issues on the
> > OODT Jira tracker. This will go a LONG way to us backing you as the
> > preferred GSoC applicant.
> > Thank you
> > Lewis
> >
>


-- 
*Lewis*

Reply via email to