Thanks Hitesh. We do run a local HDFS, but the disk space is not used by datanodes. The namenode ui is reporting "non-dfs" used to be 90% of the size. The intermediate output from the tasks seems to be filling up the disk. I will follow your suggestion and post this to hive mailinglist.
Thanks, Dharmesh On Tue, Nov 1, 2016 at 6:51 PM, Hitesh Shah <[email protected]> wrote: > There are multiple aspects of local disk. Is the disk usage being taken up > by the NodeManager local dirs? Is it being taken by the NodeManager log > dirs? Are you running HDFS which will also consume local disk space i.e. > datanode’s data dirs? Could you clarify in terms of the above as to what is > taking up a lot of space? > > The tez staging dir, hive scratch dir are usually meant to be configured > to point to a distributed FS. Have you configured them to use the Azure > store FS? FWIW, in most cases, the Tez staging dir is not very large as it > stores meta data and not the real data being processed. > > Additionally, this might be better to post to the hive mailing lists in > terms of how they manage intermediate data before the table is made visible > to other users. > > thanks > — Hitesh > > > > On Nov 1, 2016, at 4:40 PM, Dharmesh Kakadia <[email protected]> > wrote: > > > > Hi, > > > > I am trying to understand meaning and relation between following > > configurations when running Hive on Tez. I have default FS as Azure store > > and trying to figure out where all the local disk is utilized because I > am > > running into disk space filling up while large ORC table conversion. > > > > hive.exec.stagingdir > > tez.staging-dir > > hive.exec.scratchdir > > > > Any help ? > > > > Thanks, > > Dharmesh > >
