[ 
https://issues.apache.org/jira/browse/ORC-228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated ORC-228:
-------------------------------
    Description: 
currently addedRow() looks like
{noformat}
public void addedRow(int rows) throws IOException {
    rowsAddedSinceCheck += rows;
    if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
      notifyWriters();
    }
  }
{noformat}

it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value so 
that we can generate multiple stripes with very little data.

Currently the only way to do this is to create a new MemoryManager that 
overrides this method and install it via OrcFile.WriterOptions but this only 
works when you have control over creating the Writer.
For example 
_org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta()_

There is no way to do this via some set of config params to make Hive query for 
example, create multiple stripes with little data.

  was:
currently addedRow() looks like
{noformat}
public void addedRow(int rows) throws IOException {
    rowsAddedSinceCheck += rows;
    if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
      notifyWriters();
    }
  }
{noformat}

it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value so 
that we can generate multiple stripes with very little data.

Currently the only way to do this is to create a new MemoryManager that 
overrides this method and install it via OrcFile.WriterOptions but this only 
works when you have control over creating the Writer.

There is no way to do this via some set of config params to make Hive query for 
example, create multiple stripes with little data.


> Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable
> -------------------------------------------------------
>
>                 Key: ORC-228
>                 URL: https://issues.apache.org/jira/browse/ORC-228
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> currently addedRow() looks like
> {noformat}
> public void addedRow(int rows) throws IOException {
>     rowsAddedSinceCheck += rows;
>     if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
>       notifyWriters();
>     }
>   }
> {noformat}
> it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value 
> so that we can generate multiple stripes with very little data.
> Currently the only way to do this is to create a new MemoryManager that 
> overrides this method and install it via OrcFile.WriterOptions but this only 
> works when you have control over creating the Writer.
> For example 
> _org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta()_
> There is no way to do this via some set of config params to make Hive query 
> for example, create multiple stripes with little data.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to