[
https://issues.apache.org/jira/browse/HDDS-2467?focusedWorklogId=342655&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-342655
]
ASF GitHub Bot logged work on HDDS-2467:
----------------------------------------
Author: ASF GitHub Bot
Created on: 13/Nov/19 15:32
Start Date: 13/Nov/19 15:32
Worklog Time Spent: 10m
Work Description: adoroszlai commented on pull request #152: HDDS-2467.
Allow running Freon validators with limited memory
URL: https://github.com/apache/hadoop-ozone/pull/152
## What changes were proposed in this pull request?
1. Freon validators read each item to be validated completely into a
`byte[]` buffer. This allows timing only the read (and buffer allocation), but
not the subsequent digest calculation. However, it also means that memory
required for running the validators is proportional to key size. I propose to
add a command-line flag (`-s` / `--stream`) which, when specified, makes Freon
calculate the digest while reading the input stream. This changes timing
results a bit, since values will include the time required for digest
calculation. On the other hand, Freon will be able to validate huge keys with
limited memory.
2. Reduce the memory requirement of the non-stream version by allocating a
buffer exactly the size of the key. This adds a bit of overhead in time, since
key info needs to be fetched, too. But it eliminates `ByteArrayOutputStream`,
which allocates incrementally larger and larger buffers. The latter can lead
to memory requirement twice the actual key size in the worst case (since `2^n >
2^n-1 + 2^n-2 + ...`).
3. Get rid of code duplication between `SameKeyReader` and
`OzoneClientKeyValidator`.
4. Allow `OzoneClientKeyGenerator` to create > 2GB keys.
https://issues.apache.org/jira/browse/HDDS-2467
## How was this patch tested?
Created and validated keys using Freon. Verified that even 2.5GB key can be
created and validated with `--stream`. Verified that streaming is forced for
such a large key, since it won't fit any array. Verified that smaller keys can
be validated both ways.
```
export HADOOP_OPTS='-Xmx1024M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockg -t 1 -F ONE -n 1 -p 2_5GB -s 2684354560
ozone freon ockg -t 1 -F ONE -n 1 -p 256MB -s 268435456
ozone freon ockg -t 1 -F ONE -n 1 -p 128MB -s 134217728
ozone freon ockg -t 1 -F ONE -n 1 -p 64MB -s 67108864
ozone freon ockg -t 1 -F ONE -n 1 -p 10KB -s 10240
export HADOOP_OPTS='-Xmx128M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockv -t 1 -n 1 -p 10KB
ozone freon ockv -t 1 -n 1 -p 64MB
export HADOOP_OPTS='-Xmx64M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ockv -t 1 -n 1 -p 10KB -s
ozone freon ockv -t 1 -n 1 -p 64MB -s
ozone freon ockv -t 1 -n 1 -p 128MB -s
ozone freon ockv -t 1 -n 1 -p 256MB -s
ozone freon ockv -t 1 -n 1 -p 2_5GB -s
ozone freon ockv -t 1 -n 1 -p 2_5GB
ozone freon ockg -t 1 -F ONE -n 100 -p 1KB -s 1024
ozone freon ockv -n 100 -p 1KB
ozone freon ocokr -t 4 -k '64MB/0' -n 32 -s
ozone freon ocokr -t 8 -k '256MB/0' -n 16 -s
export HADOOP_OPTS='-Xmx1024M -XX:+HeapDumpOnOutOfMemoryError'
ozone freon ocokr -t 2 -k '256MB/0' -n 16
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 342655)
Remaining Estimate: 0h
Time Spent: 10m
> Allow running Freon validators with limited memory
> --------------------------------------------------
>
> Key: HDDS-2467
> URL: https://issues.apache.org/jira/browse/HDDS-2467
> Project: Hadoop Distributed Data Store
> Issue Type: Improvement
> Components: freon
> Reporter: Attila Doroszlai
> Assignee: Attila Doroszlai
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Freon validators read each item to be validated completely into a {{byte[]}}
> buffer. This allows timing only the read (and buffer allocation), but not
> the subsequent digest calculation. However, it also means that memory
> required for running the validators is proportional to key size.
> I propose to add a command-line flag to allow calculating the digest while
> reading the input stream. This changes timing results a bit, since values
> will include the time required for digest calculation. On the other hand,
> Freon will be able to validate huge keys with limited memory.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]