Oliver Drewes created ZEPPELIN-483: -------------------------------------- Summary: Cronjob: Infinity interpreting notes -> Infinity new files & inodes Key: ZEPPELIN-483 URL: https://issues.apache.org/jira/browse/ZEPPELIN-483 Project: Zeppelin Issue Type: Bug Components: Core Affects Versions: 0.6.0 Environment: Build for Spark 1.4.1 / mapr5 Reporter: Oliver Drewes Priority: Blocker
Lets start with the basic: Zeppelin will always write to the tmp folder. Whatever you enter for SparkInterpter settings, Zeppelin keeps writing his compiled spark source to /tmp/spark-{ID} No ENV-variable will change this behaviour. This means it takes inodes from each file it creates by interpreting the single lines of code. This wouldnt matter if you run them once. But it do run it regularly or in a cronjob, each line of your note is interpreted again and again. So 30 lines of code produce about 200 files. If you run the cronjob once a minute, it produces about 12000 Files an hour. Interpreting the code line by line, without checking if it already exists is a bad solution. For a 1 GB Filesystem f.e. you have 65k inodes available. This means if you run your source for some house, it need 100 MB of space but produces 65k files and you run out of inodes. My idea of an solution would be to check if the note has changed. If it has changed, delete the old class files and run it again. If it is the same, reuse the existing classes. If a class if the same hash exists already, reuse this class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)