Hi Onkar, please note the subj change (to keep the discussion focused). > How to dive into a large codebase ?
Answering your question - there are many ways to approach large codebase and you have to learn your's own - nobody except yourself can teach you this. And Zeppelin is quite small, so now should be a good time to start doing that. You should begin with checkout the project from VCS, build it locally, import it in IDE of your choice (mvn eclipse:eclipse and then "Import existing project" would do for Eclipse) and then start exploring the top-level sub-modules\folders from there. Through this summer project you will be mostly interested in NotebookRepo, so a good start for it will be you, tinkering with existing implementations at first. An interesting week-long pilot project to help you getting started, as well as a decent blog post subject, can be - implement a new XmlNotebookRepo. The goal would be: to have .xml representation of the notebook persisted in local filesystem along with existing .json one. Could be just note.xml in the same folder, or could be `./notebook-xml/<noteId>/note.xml` - it's up to you. It should save the same notebook, but in XML format, just in the local filesystem. Then it as any other storage [1], can be be configured to use together with existing VFSNotebookRepo thought NotebookRepoSync. Please, working on this first project do follow Zeppelin's guideline [3], and as this is a pilot one - please feel free to crete final PR to [4] instead of apache github mirror to showcase your work. Feel free to look around how other PRs looks like, and what usually a reviewers ask for (documentation, simple tests, etc) What do you think? Would you be willing to accept? As I mentioined before - opensource is a place where self-learning is the king and nobody will teach in the university sense of this word. Hope it all makes sense and I'm looking forward your first PR soon! :) P.S in this book [2] you have a chance to see how this task is approached by other experienced engineers in other projects. Nice read. 1. http://zeppelin.incubator.apache .org/docs/0.6.0-incubating-SNAPSHOT/storage/storage.html 2. http://www.codersatwork.com 3. https://github.com/apache/incubator-zeppelin/blob/master/CONTRIBUTING.md 4. https://github.com/bzz/incubator-zeppelin On Mon, Apr 25, 2016 at 7:09 PM, onkar shedge <shedge31on...@gmail.com> wrote: > Sorry forgot to mention this in previous email. > > I wanted to ask one thing. > How to dive into a large codebase ?. I read some answers suggesting see old > commits how the project was developed and debug to see control flow. > > On Mon, Apr 25, 2016 at 3:26 PM, onkar shedge <shedge31on...@gmail.com> > wrote: > > > Hello Alex, > > > > Here is a link to my blog[1]. > > I have added google calendar with three weeks events. I haven't written > > any posts about my progress, Will do soon. > > > > 1] http://gsoc2016onkar.blogspot.in/ > > > > On Sun, Apr 24, 2016 at 4:34 PM, Alexander Bezzubov <b...@apache.org> > > wrote: > > > >> Thank you Onkar, > >> > >> Its great to have you on board and looking forward experiments with P2P > >> notebook storage! > >> > >> It's great idea to keep list posted on your progress as well as having > >> deeper writeups on a personal blog, please feel free to share links, > etc. > >> > >> It would be good if in emails you could not only list things that you > did > >> last week, but also to include a brief plan for the next week. This way > it > >> should be easier for me as a mentor to align efforts in the same > >> direction. > >> > >> It is going to be an exciting project! > >> > >> -- > >> Alex > >> > >> On Sat, Apr 23, 2016, 22:55 onkar shedge <shedge31on...@gmail.com> > wrote: > >> > >> > Hello zeppelin community, > >> > > >> > Thanks for giving me the opportunity. I will get myself more familiar > >> with > >> > the codebase and ask questions on mailing list about my doubts . Also > I > >> > will post updates weekly/(4 day interval) what I understood,what I > >> > worked/read, on my blog. > >> > > >> > > >> > Regards, > >> > Onkar Shedge. > >> > > >> > On Fri, Mar 25, 2016 at 10:59 AM, Alexander Bezzubov <b...@apache.org> > >> > wrote: > >> > > >> > > Hi Onkar, > >> > > > >> > > that sounds great, thank you. > >> > > > >> > > Looking forward helping with this project though the summer! > >> > > > >> > > -- > >> > > Alex > >> > > > >> > > On Fri, Mar 25, 2016 at 2:27 PM, onkar shedge < > >> shedge31on...@gmail.com> > >> > > wrote: > >> > > > >> > > > Thanks Sir, > >> > > > I have made the changes: diagram, Deliverables removed research. > >> And > >> > > > support for one more p2p storage. > >> > > > Also I have uploaded the pdf. > >> > > > > >> > > > Regards, > >> > > > Onkar Shedge > >> > > > > >> > > > On Thu, Mar 24, 2016 at 3:49 PM, Alexander Bezzubov < > b...@apache.org > >> > > >> > > > wrote: > >> > > > > >> > > > > Hi Onkar, > >> > > > > > >> > > > > thank you for sharing a blog and even a video of your > >> explorations in > >> > > > > preparation for the project. > >> > > > > Your timeline and proposal looks very strong and it seems that > you > >> > > > relevant > >> > > > > experience for this project. > >> > > > > > >> > > > > On the GSoC scope - it would be a good start with a storage > >> > > > > implementation(s) that result in dat://, magnet:// or ipfs:// > >> links > >> > > for a > >> > > > > notebooks, as a first step. From there sharing the link can be > >> done > >> > > > through > >> > > > > any communication medium (IM, email, etc) and importing such > link > >> in > >> > > > > Zeppelin instance is a matter of changing "Import" > dialog\backend > >> > > inside > >> > > > > Zeppelin to support it. And then proper note > >> versioning\modification > >> > > > > support, as well as make sure that it plays nicely with multiple > >> > > > > NotebookRepo plugged in though NotebookRepoSync [1]. > >> > > > > I would expect at least those use-cases to be implemented as a > >> part > >> > of > >> > > > the > >> > > > > GSoC project. > >> > > > > > >> > > > > On the deliverables: > >> > > > > - "research" is not a deliverable item, may be would be better > to > >> > put > >> > > a > >> > > > > "Report on results of the research, covering suitability of each > >> p2p > >> > > > > network\stack for the Zeppelin case" > >> > > > > > >> > > > > Would you be willing to, just as an extra bonus material, take > >> care > >> > of > >> > > > more > >> > > > > the one p2p NotebookRepo implementation, of course in case if > time > >> > > > permits? > >> > > > > > >> > > > > I have also added few comments to the doc itself. > >> > > > > > >> > > > > Please feel free to incorporate feedback do not forget to submit > >> the > >> > > > final > >> > > > > pdf to google before the deadline tomorrow! > >> > > > > > >> > > > > 1. > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/NotebookRepoSync.java#L40 > >> > > > > > >> > > > > -- > >> > > > > Alex > >> > > > > > >> > > > > > >> > > > > > >> > > > > On Thu, Mar 24, 2016 at 12:11 PM, onkar shedge < > >> > > shedge31on...@gmail.com> > >> > > > > wrote: > >> > > > > > >> > > > > > Hi moon, > >> > > > > > Thanks for your idea. You talked about single online storage > and > >> > then > >> > > > > > sharing with others like(Google docs). Then handling fault > >> > tolerance > >> > > as > >> > > > > > multiple instances would change the same storage repo. > >> > > > > > I was thinking till now that P2P implementation would be > >> generating > >> > > the > >> > > > > > torrent file in case of Bittorrent or dat ://......dat link in > >> case > >> > > of > >> > > > > dat > >> > > > > > protocol and that file would be changed/versioned as changes > are > >> > made > >> > > > new > >> > > > > > hashes will be generated. So the question now is how to share > >> the > >> > > > torrent > >> > > > > > file or dat link to other peers? > >> > > > > > > >> > > > > > Is it that there would be a Zeronet site for having a list of > >> peers > >> > > > > online > >> > > > > > and each user sharing his notebooks which he has chosen to > >> share. > >> > > > > > > >> > > > > > On Thu, Mar 24, 2016 at 7:59 AM, moon soo Lee < > m...@apache.org> > >> > > wrote: > >> > > > > > > >> > > > > > > Hi, > >> > > > > > > > >> > > > > > > Scope of ZEPPELIN-683 is implementing a Zeppelin > NotebookRepo > >> [1] > >> > > > based > >> > > > > > on > >> > > > > > > one of P2P technology. I think ZEPPELIN-683 leads to very > >> > > interesting > >> > > > > > > challenge (as a future work). > >> > > > > > > > >> > > > > > > I can see characteristics of P2P technology based > NotebookRepo > >> > as, > >> > > > > > > > >> > > > > > > * Massively (globally) scalable. > >> > > > > > > * Very Elastic. Any peer can join and leave at any time. > >> > > > > > > > >> > > > > > > Therefore i can see following possibilities and challenges. > >> > > > > > > > >> > > > > > > * Make every zeppelin instance connect to the single storage > >> > > network. > >> > > > > > > * Then it is possible to provide user unlimited online > >> notebook > >> > > > > storage. > >> > > > > > > * And there will be nicer way to share notebook to the other > >> > > people. > >> > > > > > > * Zeppelin currently does handle the case multiple zeppelin > >> > > instance > >> > > > > > share > >> > > > > > > single storage. To leverage advantage of P2P technology > based > >> > > > > > NotebookRepo, > >> > > > > > > Zeppelin need to aware that storage can be updated by other > >> > > Zeppelin > >> > > > > > > instances. This could be challenging job. > >> > > > > > > * I think it's very much related to support of fault > >> tolerance. > >> > > > > > > > >> > > > > > > I think ZEPPELIN-683 is very wide open to be evolved. Please > >> feel > >> > > > free > >> > > > > to > >> > > > > > > add your idea. > >> > > > > > > > >> > > > > > > Thanks, > >> > > > > > > moon > >> > > > > > > > >> > > > > > > [1] > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/apache/incubator-zeppelin/blob/master/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/NotebookRepo.java > >> > > > > > > > >> > > > > > > > >> > > > > > > On Tue, Mar 22, 2016 at 10:34 PM onkar shedge < > >> > > > shedge31on...@gmail.com > >> > > > > > > >> > > > > > > wrote: > >> > > > > > > > >> > > > > > > > Just wondering isn't this project important as the other > two > >> > > Apache > >> > > > > > Beam > >> > > > > > > > interpreter and Sample Notebooks ? > >> > > > > > > > > >> > > > > > > > On Tue, Mar 22, 2016 at 10:59 AM, onkar shedge < > >> > > > > > shedge31on...@gmail.com> > >> > > > > > > > wrote: > >> > > > > > > > > >> > > > > > > > > Hello, > >> > > > > > > > > Apologies for late reply. > >> > > > > > > > > I have spend time understanding the protocol.I installed > >> all > >> > > the > >> > > > > > three > >> > > > > > > > > techs and tried them. > >> > > > > > > > > Also I read about the docs, whitepapers[1].I read the > >> > > bittorent > >> > > > > > > protocol > >> > > > > > > > > and wrote a blog[2] about it using jekyll. Please do > watch > >> > the > >> > > > > video. > >> > > > > > > > > > >> > > > > > > > > I have written the proposal[3].Your feedback are > >> welcomed. I > >> > am > >> > > > > > > confused > >> > > > > > > > , > >> > > > > > > > > exactly what to write in implementation part.right now I > >> have > >> > > > > written > >> > > > > > > > about > >> > > > > > > > > the current implementation. > >> > > > > > > > > > >> > > > > > > > > The available clients are > >> > > > > > > > > IPFS - [4] Java > >> > > > > > > > > dat : browserify [5], desktop[6], [7] python . If not > >> > > available, > >> > > > > > > should I > >> > > > > > > > > consider writing own Java client? is it doable? > >> > > > > > > > > Zeronet : I didn't understand how notebooks can be > shared > >> > with > >> > > > > > zeronet > >> > > > > > > > > which serves sites?. > >> > > > > > > > > > >> > > > > > > > > Please comment. I guess there are many mistakes. Thank > >> You. > >> > > > > > > > > > >> > > > > > > > > 1] > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/ipfs/papers/raw/master/ipfs-cap2pfs/ipfs-p2p-file-system.pdf > >> > > > > > > > > 2] > >> > > https://onkarshedge.github.io/2016/03/16/peeking-in-p2p.html > >> > > > > > > > > https://www.youtube.com/watch?v=WxX0AjqQ28g > >> > > > > > > > > 3] > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://docs.google.com/document/d/1GVu_LEi8o6wnnoj9vrt07j8ByiDCLemYh9F9ERLyni8/edit?usp=sharing > >> > > > > > > > > 4] https://github.com/ipfs/java-ipfs-api > >> > > > > > > > > 5] https://github.com/karissa/dat-browserify > >> > > > > > > > > 6] https://github.com/karissa/dat-desk > >> > > > > > > > > 7]https://github.com/karissa/datpy > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > On Thu, Mar 10, 2016 at 6:28 PM, Alexander Bezzubov < > >> > > > > b...@apache.org> > >> > > > > > > > > wrote: > >> > > > > > > > > > >> > > > > > > > >> Hi Onkar, > >> > > > > > > > >> > >> > > > > > > > >> great to hear the you are interested and thank you for > >> > sharing > >> > > > the > >> > > > > > > > example > >> > > > > > > > >> notebook that you'v built, preview [0] looks great. > >> > > > > > > > >> > >> > > > > > > > >> I encourage you review this mailing list archives very > >> > > > carefully, > >> > > > > > > > looking > >> > > > > > > > >> for the advices to other students on how to get started > >> with > >> > > > > > zeppelin > >> > > > > > > > and > >> > > > > > > > >> proceed with proposal draft [1] [2] [3]. > >> > > > > > > > >> > >> > > > > > > > >> Research, as well as publishing the results of such in > >> > > > wiki\blogs > >> > > > > > > should > >> > > > > > > > >> be > >> > > > > > > > >> substantial part of this project. The expectations are > >> > though > >> > > > that > >> > > > > > you > >> > > > > > > > >> will > >> > > > > > > > >> be able to familiarize yourself with the p2p protocols > at > >> > > least > >> > > > a > >> > > > > > bit > >> > > > > > > > >> before starting actual gsoc project. Engaging and > >> bridging > >> > > > > multiple > >> > > > > > > > >> project > >> > > > > > > > >> communities is very welcome as well. Next steps could > >> build > >> > > > > building > >> > > > > > > > >> low-fi > >> > > > > > > > >> PoC using JVM tools, and then a hi-fi one, using > >> pluggable > >> > > > > > Repository > >> > > > > > > > >> abstraction [4] > >> > > > > > > > >> > >> > > > > > > > >> Hope this helps and looking forward your proposal > draft: > >> > > > plaintext > >> > > > > > in > >> > > > > > > > wiki > >> > > > > > > > >> [5] or a link to a google doc will work nicely to > gather > >> the > >> > > > > > feedback > >> > > > > > > > and > >> > > > > > > > >> engage with potential mentors. > >> > > > > > > > >> > >> > > > > > > > >> -- > >> > > > > > > > >> Alex > >> > > > > > > > >> > >> > > > > > > > >> 0. https://www.zeppelinhub > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > .com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL29ua2Fyc2hlZGdlL3NhbXBsZS1ub3RlYm9va3MvbWFzdGVyLzJCRllGVVpDUC9ub3RlLmpzb24 > >> > > > > > > > >> 1. http://markmail.org/thread/abw6hoayuvi54ghk > >> > > > > > > > >> 2. http://markmail.org/thread/j53j7d4rsiisewfb > >> > > > > > > > >> 3. http://markmail.org/message/naocktanol5iuot3 > >> > > > > > > > >> 4. http://zeppelin.incubator.apache > >> > > > > > > > >> > .org/docs/0.6.0-incubating-SNAPSHOT/storage/storage.html > >> > > > > > > > >> 5. https://cwiki.apache > >> > > > > > > > >> > >> .org/confluence/display/ZEPPELIN/Google+Summer+Of+Code+2016 > >> > > > > > > > >> > >> > > > > > > > >> > >> > > > > > > > >> On Wed, Mar 9, 2016 at 11:56 PM, onkar shedge < > >> > > > > > > shedge31on...@gmail.com> > >> > > > > > > > >> wrote: > >> > > > > > > > >> > >> > > > > > > > >> > Hello Alexander, > >> > > > > > > > >> > I am Onkar from PICT, Pune India. I am interested in > >> the > >> > > > project > >> > > > > > > idea > >> > > > > > > > >> > regarding Notebook distributed Storage using P2P > >> > protocols. > >> > > > > > > > >> > In order to contribute and aid in this project, I > have > >> > been > >> > > > > > working > >> > > > > > > > with > >> > > > > > > > >> > Zeppelin Notebooks.This is a link to one of my sample > >> > > notebook > >> > > > > > which > >> > > > > > > > >> uses a > >> > > > > > > > >> > dataset about Indian school data from data.gov.in: > >> > > > github-repo > >> > > > > > > > >> > < > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://github.com/onkarshedge/sample-notebooks/blob/master/2BFYFUZCP/note.json > >> > > > > > > > >> > > > >> > > > > > > > >> > . > >> > > > > > > > >> > > >> > > > > > > > >> > I am familiar with IPython it also uses similar > >> > json(.ipynb) > >> > > > way > >> > > > > > to > >> > > > > > > > >> > represent notebook. So as per my understanding we > have > >> to > >> > > > divide > >> > > > > > the > >> > > > > > > > >> json > >> > > > > > > > >> > file into chunks and store in a distributed manner > >> > according > >> > > > to > >> > > > > > > > >> protocol. > >> > > > > > > > >> > While I am familiar with the basics of the product > and > >> > have > >> > > a > >> > > > > > clear > >> > > > > > > > >> idea of > >> > > > > > > > >> > what is required by the problem statement, I am not > >> quite > >> > > sure > >> > > > > how > >> > > > > > > to > >> > > > > > > > >> > proceed about it. I would appreciate your guidance > >> > regarding > >> > > > the > >> > > > > > > > same. I > >> > > > > > > > >> > was thinking about starting with a brief comparative > >> study > >> > > of > >> > > > > the > >> > > > > > > > >> protocols > >> > > > > > > > >> > suggested( dat, ipfs, zeronet). I hope to hear your > >> views > >> > > > about > >> > > > > > > this. > >> > > > > > > > >> > > >> > > > > > > > >> > Thanking you, > >> > > > > > > > >> > Onkar Shedge > >> > > > > > > > >> > > >> > > > > > > > >> > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > > > > >