[
https://issues.apache.org/jira/browse/ORC-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16643753#comment-16643753
]
Owen O'Malley commented on ORC-408:
-----------------------------------
We absolutely need to do better about giving the user guidance about how much
memory is likely to be required for a given schema to create a file with a
reasonable number of rows/stripe.
The memory manager should already be imposing a limit for each writer and
shrink it as more writers are added. If it is over the limit, the writer
flushes the stripe. Flushing the stripe is obviously better than killing the
task. Grouping the writers together can be done by creating a MemoryManager per
a group.
So what are you looking for?
* Better bounds on the memory usage?
* More frequent checks (currently limited to each 5k rows)?
> hard limit on memory use by ORC writers
> ---------------------------------------
>
> Key: ORC-408
> URL: https://issues.apache.org/jira/browse/ORC-408
> Project: ORC
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Priority: Major
>
> Scenario: we want to hard-limit (within the constraints imposed by using
> Java) the memory used by a particular Hive task dedicated to ORC writing, to
> protect other tasks from misbehaving queries. This is similar to how we e.g.
> limit the memory used for hash join - when the hash table goes over the
> limit, the task fails.
> However, we currently cannot even hard-limit this for a single writer, much
> less for several writers combined, when they are writing.
> I wonder if it's possible to add two features to MemoryManager:
> 1) Grouping writers. A tag can be supplied externally (e.g. when creating the
> writer).
> 2) Hard-limiting the memory by tag - if the group exceeds the memory
> allowance, all the corresponding writers should be made to fail on next
> operation, via the callback.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)