ASF GitHub Bot commented on ORC-228:

Github user prasanthj commented on a diff in the pull request:

    --- Diff: java/core/src/java/org/apache/orc/impl/MemoryManagerImpl.java ---
    @@ -81,6 +81,7 @@ public Thread getOwner() {
       public MemoryManagerImpl(Configuration conf) {
         double maxLoad = OrcConf.MEMORY_POOL.getDouble(conf);
    --- End diff --
    I don't think we want to support too larger interval for this. Having very 
high value means prolonging the memory check which is bad (flush often as 
opposed to don't flush and fail). 
    1 to 10000 may be good range. Also please make a note in the description 
that keeping too low value is for testing only and can cause too early flushes 
in some cases and generate sub-optimal orc files.

> Make MemoryManagerImpl.ROWS_BETWEEN_CHECKS configurable
> -------------------------------------------------------
>                 Key: ORC-228
>                 URL: https://issues.apache.org/jira/browse/ORC-228
>             Project: ORC
>          Issue Type: Improvement
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
> currently addedRow() looks like
> {noformat}
> public void addedRow(int rows) throws IOException {
>     rowsAddedSinceCheck += rows;
>     if (rowsAddedSinceCheck >= ROWS_BETWEEN_CHECKS) {
>       notifyWriters();
>     }
>   }
> {noformat}
> it would be convenient for testing to set ROWS_BETWEEN_CHECKS to a low value 
> so that we can generate multiple stripes with very little data.
> Currently the only way to do this is to create a new MemoryManager that 
> overrides this method and install it via OrcFile.WriterOptions but this only 
> works when you have control over creating the Writer.
> For example 
> _org.apache.hadoop.hive.ql.io.orc.TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta()_
> There is no way to do this via some set of config params to make Hive query 
> for example, create multiple stripes with little data.

This message was sent by Atlassian JIRA

Reply via email to