Hi, I am using hiveContext.sql() method to select data from source table and insert into parquet tables. The query executed from spark takes about 3x more disk space to write the same number of rows compared to when fired from impala. Just wondering if this is normal behaviour and if there's a way to control this.
Best Vaibhav. -- Sent from my iPhone.