Hi Lewis, I've been reading up on the doc you provided earlier.
I've made some progress. I've looked into the filemgr component and run a few commands to ingest files etc. I understand how it works now. About the potential workflow -- (This is just my initial understanding. I could be wrong about this, please correct me) I think I have to rewrite the entire component to conform to the avro style specification. So this means, I need to define the scheme for all the files inside filemanger/structs -- Product.java, ProductPage.java etc. I should define the schema for each of these similar to that specified for "User" on this link - http://avro.apache.org/docs/current/gettingstartedjava.html#Defining+a+schema Currently I think this piece of code (Product.java) constructs an xml file for each product and so that the rpcClient can send it over the xml-rcp interface to the filemgr server. This project aims to redefine this process to send the data as a binary encoding (for smaller size, and thus smaller latency) by using the avro protocol. And then I should invoke the avro code generation tools from within org.apache...system.XmlRpcFileManagerClient (probably have to rewrite this module to fit Avro client specification as well) I should also make the XmlRpcFileManger (server) fit to the avro specific implementation of the server interface. I think this has to be repeated for all the components within oodt (workflow manager etc) I also have some questions:- 1. Is there any specific reason for picking Avro over Thrift or Protocol Buffers? 2. I also came across this answer on quora on Avro vs. XML-RPC http://www.quora.com/What-merits-does-Avro-RPC-have-over-XML-RPC/answer/Ted-Dunning-1?__snids__=959769040&__nsrc__=1&__filter__=all The author talks about another binary format - Simple Binary Encoding. And recommends using protocol buffers for their wide use and documentation. Can you share your thoughts about this? I'd also like to run some more examples of the filemgr client/server. That way I can run some commands like these https://cwiki.apache.org/confluence/display/OODT/Exploring+the+OODT+File+Manager+XML-RPC+Interface and understand the overhead caused by xml-rpc or get a sense of what the latency of using xml-rcp is. Can you also share examples of filemgr servers running in the real-world that I could query or use? Any other comments/suggestions are welcome! :) Thanks! -- Aditya adi On Tue, Feb 10, 2015 at 11:29 PM, Lewis John Mcgibbney < [email protected]> wrote: > Hi Aditya, > > On Tue, Feb 10, 2015 at 11:05 PM, Aditya Dhulipala <[email protected]> > wrote: > > > Hi Lewis, > > > > Thanks for the info! > > > > No problems, thanks for looking in to this one. If this project goes ahead > I can guarantee you it will be a MAJOR contribution to OODT. > > > > > > I've been looking at the code you've posted above and trying to get a > sense > > of what it does. > > I'm looking through the imports and I found some schema definitions in > > XmlStructFactory & XmlRpcStructFactory make this suitable for transfer > > over xml-rpc. > > > > Correct > > > > > > My guess is that this project would be to write .avpr files (Avro files) > > that define these file manager structs > > > This would be my advice. We first define the Schema and Protocol > implementations, then we move towards functional implementations. > > > > & then replace all the xml-rpc > > (XmlStructFactory & XmlRpcStructFactory) with Avro specific > implementations > > like this > > https://github.com/phunt/avro-rpc-quickstart > > > > Yes, although Patrick's code is a 'little' out of date (running on 1.7.4 > (from memory), whereas Avro is running on 1.7.7.) Patrick has captured very > conveniently some excellent integration and definitions for us. > > > > > > I understand what Avro is and how its better than xmlrpc for performance > > reasons. I'm interested in reading more about this topic. And in > particular > > how it affects various components of OODT. > > > > Well, this is exactly what needs to be within a proposal for a GSoC project > so I am VERy keen to help you out here. > > > > > > Can you point me in the right direction? > > > > So, some more higher level Javadoc > > https://cwiki.apache.org/confluence/display/OODT/Apache+OODT+APIs#ApacheOODTAPIs-XML-RPC > > You will also see that the data structrues in the OODT services packages > e.g. File Manager, Workflow Manager and Resource Manager are always located > in a /structs/ package. These are the fundamental data structures we need > to expose when thinking about the Avro Protocol implementations. > > The basic user documentation for File Manager can be found > http://oodt.apache.org/components/maven/filemgr/user/basic.html > > The advanced (certainly advanced to a level we currently require) > documentation for Mile Manager can e found > http://oodt.apache.org/components/maven/filemgr/development/developer.html > > I am sure you will find the latter useful. > > Please write back here once you've engaged with this, we will take it from > there. > It may also be a good idea to be jotting down how you see a potential > workflow coming together e.g. what tasks do you think will need implemented > in Avro RPC. This will make it much easier when we actually come to running > and planning against a schedule of work. > > Thanks again > Lewis >
