Thanks Gopal. Does ORC conversion have to see the entire data before it can write output? My source table is fairly large (~70 TB) which I am trying to convert to ORC. Both the source and destination table is on WASB remote store and has a lot of space. But the conversion job runs out of disk space while running the reducers part of ORC conversion query. Are there alternative ways to achieve ORC conversion that does fill up disk?
Thanks, Dharmesh On Tue, Nov 1, 2016 at 7:30 PM, Gopal Vijayaraghavan <[email protected]> wrote: > > The namenode ui is reporting "non-dfs" used to be 90% of the size. > > That space is unlikely to be related to the hive or tez scratch dir > configs. > > If you inspect your disks with (or wherever your disks are) > > du -sh /grid/*/yarn/* > > you will have some idea of what is occupying that space - whether it is > logs, local data or shuffle data. > > Cheers, > Gopal > > > >
