I have created a JIRA epic to track down all the task: https://issues.apache.org/jira/browse/ZEPPELIN-1525
I think I would start by the synchronize blocks and then move onto Eric's PR for Guice DI. After we have a DI mechanism, it will be much easier to inject thread pools for thread management and also to create JMX monitoring Any objection before I start coding ? On Sat, Oct 8, 2016 at 2:05 PM, Eric Charles <e...@apache.org> wrote: > On 04/10/16 12:54, Anthony Corbacho wrote: > >> You made my day, this is the kind of email i really like !! >> >> I think its a great idea and i am willing to spend sometime on it. >> >> I also want to move to a DI (guice) architecture , let me know what you >> think about it. >> > > A PR is opened for Guice DI. If someone jumps for review, I can rebase > > https://github.com/apache/zeppelin/pull/1361 > > > > >> On Tuesday, 4 October 2016, DuyHai Doan <doanduy...@gmail.com> wrote: >> >> Hello devs >>> >>> The code base of Zeppelin has grown very fast in the last 12 months and >>> it's great. It means that we have more and more contributors. >>> >>> However, to make the project maintainable at long term, we need regular >>> code refactoring. >>> >>> I have some ideas to share with you >>> >>> 1) Use Java 8 to benefit from Lambda & streams. >>> >>> Now that Java 8 is well established, it is a good time to upgrade the >>> project. I believe some interpreters also need Java 8. Cassandra >>> interpreter right now does not have unit tests for the latest features >>> because the Embedded Cassandra server used for testing requires Java 8. >>> >>> It would also be a good opportunity to go through the code base and >>> replace some boilerplate for() loop with manual filtering by the stream >>> shortcut : list.stream().filter(..).map(). It would improve greatly >>> code >>> readability >>> >>> 2) Multi threading >>> >>> I've seen the usage of synchronize block at a few places in the code >>> base. >>> Although perfectly valid, it has a cost at runtime and since more and >>> more >>> people are asking for multi-tenancy or using a single Zeppelin instance >>> to >>> server multiple users, I guess the synchronized blocks has a huge cost. >>> >>> There are some solid alternatives: >>> >>> - ConcurrentHashMap if we synchronized on a map >>> - CopyOnWriteArrayList if we synchronized on a list. >>> >>> Of cours each sychronize block should be taken carefully not to introduce >>> regression >>> >>> 3) Thread management >>> >>> I've seen some usage of new Thread() {...}.run(); it may be a good time >>> to >>> introduce ThreadPool and pass them along (inside context objects for >>> example) to have a more centralized thread management >>> >>> The advantage of having thread pool is that we can manage them in a >>> single >>> place, monitor them and expose the info through JMX and also control >>> system >>> resource by defining max thread number and thread pool queue >>> >>> 4) Server monitoring >>> I hear many users on the field complain about the fact that they have to >>> restart Zeppelin server regularly because it "hangs" after running a long >>> time. >>> >>> If we can expose some system metrics through JMX, it would help people >>> monitor the state of Zeppelin server and take appropriate actions >>> >>> Right now we may only focus on monitoring the server itself, not the >>> interpreter JVMs processes. It can be done in a 2nd step >>> >>> >>> What do you think about the ideas ? >>> >>> >>