Hi all,
In order to accomplish GORA-174 ([0] GORA compiler does not handle
["string", "null"] unions in the AVRO schema), it has been noticed by Lewis
that we ("I" specially ;) should stick to the requirements of the issue.
With no doubt this is true!
I would want to open a short (short short!) debate about that specification
because I fee reluctant until an acknowledge (and Lewis suggested to ask to
all). Here is Nutch's WebPage schema as example:
{
"type": "record",
"name": "WebPage",
"namespace": "org.apache.gora.examples.generated",
"fields" : [
{"name": "url", "type": "string"},
{"name": "content", "type": ["null","bytes"]},
{"name": "parsedContent", "type": {"type":"array", "items": "string"}},
{"name": "outlinks", "type": {"type":"map", "values":"string"}},
{"name": "metadata", "type": {
"name": "Metadata",
"type": "record",
"namespace": "org.apache.gora.examples.generated",
"fields": [
{"name": "version", "type": "int"},
{"name": "data", "type": {"type": "map", "values": "string"}}
]
}}
]
}
At this moment I saw that in the original issue NUTCH-1477 [1] the problem
was about a ["null","bytes"], so I think we must not stick to solving only
["null","string"].
In the schema shown here will happen that "metadata" is mandatory and
GORA-174 does not talk about optional records. Maybe we should fix that too.
Another more thing: ["null","string"] requirement implies that nested
records must handle it too. In the example above, "Metadata : data" should
allow a map of ["null","string"], and *lets suppoose "Metadata : version"
was String*. allow "Metadata : version of type ["null","string"].
If this is not desired, will have to redefine the issue requisites. For
example something like: "allow [null,String] on topmost records fields".
===============
Taking ONLY GORA-174 title: ["null","string"] I will have to make this
modifications:
- Modify Nutch's webpage.avsc. "Content" will have to be mandatory :(
- Modify tests. Specifically testGetNested() to check nested
["null","strings"]. I think Cassandra module does will not pass this test.
===============
Lewis told about creating other issues for nested and mutitype-unions. It's
not my view, but I agree the common decision :)
Opinions?
Thanks at least for reading and getting to this line! :)
Regards,
Alfonso Nishikwa
[0] - https://issues.apache.org/jira/browse/GORA-174
[1] - https://issues.apache.org/jira/browse/NUTCH-1477
--
"Drinking bloody marys all night will make you feel like a corpse in the
morning."