There are multiple aspects of local disk. Is the disk usage being taken up by the NodeManager local dirs? Is it being taken by the NodeManager log dirs? Are you running HDFS which will also consume local disk space i.e. datanode’s data dirs? Could you clarify in terms of the above as to what is taking up a lot of space?
The tez staging dir, hive scratch dir are usually meant to be configured to point to a distributed FS. Have you configured them to use the Azure store FS? FWIW, in most cases, the Tez staging dir is not very large as it stores meta data and not the real data being processed. Additionally, this might be better to post to the hive mailing lists in terms of how they manage intermediate data before the table is made visible to other users. thanks — Hitesh > On Nov 1, 2016, at 4:40 PM, Dharmesh Kakadia <[email protected]> wrote: > > Hi, > > I am trying to understand meaning and relation between following > configurations when running Hive on Tez. I have default FS as Azure store > and trying to figure out where all the local disk is utilized because I am > running into disk space filling up while large ORC table conversion. > > hive.exec.stagingdir > tez.staging-dir > hive.exec.scratchdir > > Any help ? > > Thanks, > Dharmesh
