I have created a JIRA epic to track down all the task:
https://issues.apache.org/jira/browse/ZEPPELIN-1525

I think I would start by the synchronize blocks and then move onto Eric's
PR for Guice DI.

After we have a DI mechanism, it will be much easier to inject thread pools
for thread management and also to create JMX monitoring

Any objection before I start coding ?



On Sat, Oct 8, 2016 at 2:05 PM, Eric Charles <e...@apache.org> wrote:

> On 04/10/16 12:54, Anthony Corbacho wrote:
>
>> You made my day, this is the kind of email i really like !!
>>
>> I think its a great idea and i am willing to spend sometime on it.
>>
>> I also want to move to a DI (guice) architecture , let me know what you
>> think about it.
>>
>
> A PR is opened for Guice DI. If someone jumps for review, I can rebase
>
> https://github.com/apache/zeppelin/pull/1361
>
>
>
>
>> On Tuesday, 4 October 2016, DuyHai Doan <doanduy...@gmail.com> wrote:
>>
>> Hello devs
>>>
>>> The code base of Zeppelin has grown very fast in the last 12 months and
>>> it's great. It means that we have more and more contributors.
>>>
>>> However, to make the project maintainable at long term, we need regular
>>> code refactoring.
>>>
>>> I have some ideas to share with you
>>>
>>> 1) Use Java 8 to benefit from Lambda & streams.
>>>
>>>   Now that Java 8 is well established, it is a good time to upgrade the
>>> project. I believe some interpreters also need Java 8. Cassandra
>>> interpreter right now does not have unit tests for the latest features
>>> because the Embedded Cassandra server used for testing requires Java 8.
>>>
>>>  It would also be a good opportunity to go through the code base and
>>> replace some boilerplate for() loop with manual filtering by the stream
>>> shortcut :  list.stream().filter(..).map(). It would improve greatly
>>> code
>>> readability
>>>
>>> 2) Multi threading
>>>
>>>  I've seen the usage of synchronize block at a few places in the code
>>> base.
>>> Although perfectly valid, it has a cost at runtime and since more and
>>> more
>>> people are asking for multi-tenancy or using a single Zeppelin instance
>>> to
>>> server multiple users, I guess the synchronized blocks has a huge cost.
>>>
>>> There are some solid alternatives:
>>>
>>>  - ConcurrentHashMap if we synchronized on a map
>>>  - CopyOnWriteArrayList if we synchronized on a list.
>>>
>>> Of cours each sychronize block should be taken carefully not to introduce
>>> regression
>>>
>>> 3) Thread management
>>>
>>> I've seen some usage of new Thread() {...}.run(); it may be a good time
>>> to
>>> introduce ThreadPool and pass them along (inside context objects for
>>> example) to have a more centralized thread management
>>>
>>> The advantage of having thread pool is that we can manage them in a
>>> single
>>> place, monitor them and expose the info through JMX and also control
>>> system
>>> resource by defining max thread number and thread pool queue
>>>
>>> 4) Server monitoring
>>> I hear many users on the field complain about the fact that they have to
>>> restart Zeppelin server regularly because it "hangs" after running a long
>>> time.
>>>
>>> If we can expose some system metrics through JMX, it would help people
>>> monitor the state of Zeppelin server and take appropriate actions
>>>
>>> Right now we may only focus on monitoring the server itself, not the
>>> interpreter JVMs processes. It can be done in a 2nd step
>>>
>>>
>>> What do you think about the ideas ?
>>>
>>>
>>

Reply via email to