[ 
https://issues.apache.org/jira/browse/HADOOP-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HADOOP-3665:
----------------------------------

    Attachment: 3665-0.patch

bq. The whole point is that I would like to understand how Reduce job can 
output a file without any key values in it. The NullWritable seemed to be an 
ideal candidate for this but unfortunately I ran into exceptions when trying 
it. So I made a quick and dirty fix which is not meant to be a production ready 
(obviously NullWritable should not be special-cased in any way!).

I'm sorry, I hadn't understood this. If you only want to output null keys from 
your reduce, then the RecordWriter used by your OutputFormat can encode or 
ignore null keys (e.g. TextOutputFormat). SequenceFiles, as you discovered, 
explicitly disallow zero-length keys, so you'll need to pick a different binary 
file format to store output records. Glancing at the code, this constraint is 
inconsistently enforced, and not for any particular reason that I can discern. 
Adapting SequenceFile to handle zero-length keys might be as simple as allowing 
zero-length keys from the Writers, since the Reader looks like it could handle 
it.

bq. On the other hand there seemed to be some questions which need to be asked 
and possible addressed. One of them is that ReflectionUtils is able to call any 
constructor after setAccessible is set to true but is this what we really want 
for singleton keys? And do we really need singleton keys at all? (I believe the 
answer is positive).

There's already a fair amount of object reuse. We need an object to deserialize 
into per the Writable contract, so a registration system like the one in 
WritableComparator would be necessary in ReflectionUtils to make singletons 
work (i.e. a map of classes to instances checked before the map of classes to 
constructors). Other than NullWritable, all of the sane use cases I can think 
of are just badly designed, but there are likely good ones.

bq. How about size (length) of key value? Is it allowed to be zero?

It depends on where in the framework you're looking. The OutputFormat defines 
how to encode/handle null/NullWritable keys from the reduce (or the map if 
you're running without reduces). In 0.17, intermediate data is stored in 
SequenceFiles, so zero-length keys can't be emitted from the map. In 0.18, 
zero-length keys are supported, but their semantics are kind of odd. In most 
cases, emitting NullWritable keys from the map is not a scalable design.

bq. And why WritableComparato calls to newInstance method while this causes 
issues with any class having non-public constructor?

Most WritableComparable types use RawComparator, which provides much better 
performance while rendering this consideration irrelevant. Unfortunately, 
WritableComparator creates new instances of its internal keys whether it 
requires them or not! This is easily remedied. This patch does the following:

* No longer creates instances of the WritableComparable in WritableComparator 
when a class has registered a WritableComparator (neither does it create a 
buffer). This makes super.compare(byte[], off1, len1, byte[], off2, len2) 
illegal, but I doubt this is a problem. Though one could imagine a situation 
where a raw comparator attempts an efficient comparison but uses the slow 
comparator when the result is ambiguous, such a comparator is easily adapted.
* Lets WritableComparators be configurable, so WritableComparable objects not 
defining RawComparators are still configured before being compared
* Defines a raw comparator for NullWritable
* Changes checks in SequenceFile Writer classes to check only for key lengths 
less than zero; this doesn't require any changes to the Reader, which already 
supports zero-length keys, so the SequenceFile version doesn't need to be 
adjusted, either.
* Adds a test case for reading/writing NullWritable keys.

> WritableComparator newKey() fails for NullWritable
> --------------------------------------------------
>
>                 Key: HADOOP-3665
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3665
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.17.0
>         Environment: n/a
>            Reporter: Lukas Vlcek
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 3665-0.patch, HADOOP-3665.path
>
>
> It is not possible to use NullWritable as a key in order to suppress key 
> value in output.
> Syndrome exception:
> Caused by: java.lang.IllegalAccessException: Class 
> org.apache.hadoop.io.WritableComparator can not access a member of class 
> org.apache.hadoop.io.NullWritable with modifiers "private"
> The problem is that NullWritable is a singleton and does not provide public 
> non-parametric constructor. The following code in WritableComparator causes 
> the exception: return (WritableComparable)keyClass.newInstance();
> Proposed simple solution is to use ReflectionUtils instead (it requires 
> modification as well).
> This issue is probably related to HADOOP-2922

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to