Thanks Gopal. Does ORC conversion have to see the entire data before it can
write output? My source table is fairly large (~70 TB) which I am trying to
convert to ORC. Both the source and destination table is on WASB remote
store and has a lot of space. But the conversion job runs out of disk space
while running the reducers part of ORC conversion query. Are there
alternative ways to achieve ORC conversion that does fill up disk?

Thanks,
Dharmesh

On Tue, Nov 1, 2016 at 7:30 PM, Gopal Vijayaraghavan <[email protected]>
wrote:

> > The namenode ui is reporting "non-dfs" used to be 90% of the size.
>
> That space is unlikely to be related to the hive or tez scratch dir
> configs.
>
> If you inspect your disks with (or wherever your disks are)
>
> du -sh /grid/*/yarn/*
>
> you will have some idea of what is occupying that space - whether it is
> logs, local data or shuffle data.
>
> Cheers,
> Gopal
>
>
>
>

Reply via email to