Thanks Hitesh.

We do run a local HDFS, but the disk space is not used by datanodes. The
namenode ui is reporting "non-dfs" used to be 90% of the size. The
intermediate output from the tasks seems to be filling up the disk.
I will follow your suggestion and post this to hive mailinglist.

Thanks,
Dharmesh

On Tue, Nov 1, 2016 at 6:51 PM, Hitesh Shah <[email protected]> wrote:

> There are multiple aspects of local disk. Is the disk usage being taken up
> by the NodeManager local dirs? Is it being taken by the NodeManager log
> dirs? Are you running HDFS which will also consume local disk space i.e.
> datanode’s data dirs? Could you clarify in terms of the above as to what is
> taking up a lot of space?
>
> The tez staging dir, hive scratch dir are usually meant to be configured
> to point to a distributed FS. Have you configured them to use the Azure
> store FS? FWIW, in most cases, the Tez staging dir is not very large as it
> stores meta data and not the real data being processed.
>
> Additionally, this might be better to post to the hive mailing lists in
> terms of how they manage intermediate data before the table is made visible
> to other users.
>
> thanks
> — Hitesh
>
>
> > On Nov 1, 2016, at 4:40 PM, Dharmesh Kakadia <[email protected]>
> wrote:
> >
> > Hi,
> >
> > I am trying to understand meaning and relation between following
> > configurations when running Hive on Tez. I have default FS as Azure store
> > and trying to figure out where all the local disk is utilized because I
> am
> > running into disk space filling up while large ORC table conversion.
> >
> > hive.exec.stagingdir
> > tez.staging-dir
> > hive.exec.scratchdir
> >
> > Any help ?
> >
> > Thanks,
> > Dharmesh
>
>

Reply via email to