Hello devs

The code base of Zeppelin has grown very fast in the last 12 months and
it's great. It means that we have more and more contributors.

However, to make the project maintainable at long term, we need regular
code refactoring.

I have some ideas to share with you

1) Use Java 8 to benefit from Lambda & streams.

  Now that Java 8 is well established, it is a good time to upgrade the
project. I believe some interpreters also need Java 8. Cassandra
interpreter right now does not have unit tests for the latest features
because the Embedded Cassandra server used for testing requires Java 8.

 It would also be a good opportunity to go through the code base and
replace some boilerplate for() loop with manual filtering by the stream
shortcut :  list.stream().filter(..).map(). It would improve greatly code
readability

2) Multi threading

 I've seen the usage of synchronize block at a few places in the code base.
Although perfectly valid, it has a cost at runtime and since more and more
people are asking for multi-tenancy or using a single Zeppelin instance to
server multiple users, I guess the synchronized blocks has a huge cost.

There are some solid alternatives:

 - ConcurrentHashMap if we synchronized on a map
 - CopyOnWriteArrayList if we synchronized on a list.

Of cours each sychronize block should be taken carefully not to introduce
regression

3) Thread management

I've seen some usage of new Thread() {...}.run(); it may be a good time to
introduce ThreadPool and pass them along (inside context objects for
example) to have a more centralized thread management

The advantage of having thread pool is that we can manage them in a single
place, monitor them and expose the info through JMX and also control system
resource by defining max thread number and thread pool queue

4) Server monitoring
I hear many users on the field complain about the fact that they have to
restart Zeppelin server regularly because it "hangs" after running a long
time.

If we can expose some system metrics through JMX, it would help people
monitor the state of Zeppelin server and take appropriate actions

Right now we may only focus on monitoring the server itself, not the
interpreter JVMs processes. It can be done in a 2nd step


What do you think about the ideas ?

Reply via email to