Oliver Drewes created ZEPPELIN-483:
--------------------------------------

             Summary: Cronjob: Infinity interpreting notes -> Infinity new 
files & inodes
                 Key: ZEPPELIN-483
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-483
             Project: Zeppelin
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.0
         Environment: Build for Spark 1.4.1 / mapr5
            Reporter: Oliver Drewes
            Priority: Blocker


Lets start with the basic:
Zeppelin will always write to the tmp folder. Whatever you enter for 
SparkInterpter settings, Zeppelin keeps writing his compiled spark source to 
/tmp/spark-{ID}
No ENV-variable will change this behaviour.

This means it takes inodes from each file it creates by interpreting the single 
lines of code. This wouldnt matter if you run them once. But it do run it 
regularly or in a cronjob, each line of your note is interpreted again and 
again. So 30 lines of code produce about 200 files. If you run the cronjob once 
a minute, it produces about 12000 Files an hour. Interpreting the code line by 
line, without checking if it already exists is a bad solution. 

For a 1 GB Filesystem f.e. you have 65k inodes available. This means if you run 
your source for some house, it need 100 MB of space but produces 65k files and 
you run out of inodes. 

My idea of an solution would be to check if the note has changed. If it has 
changed, delete the old class files and run it again. 
If it is the same, reuse the existing classes.
If a class if the same hash exists already, reuse this class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to