Jerry Adair created ORC-541:
-------------------------------
Summary: Extend CHAR behavior to STRING
Key: ORC-541
URL: https://issues.apache.org/jira/browse/ORC-541
Project: ORC
Issue Type: Improvement
Components: C++
Affects Versions: 1.5.6
Reporter: Jerry Adair
Fix For: 1.5.7
This issue is a dual-purpose animal of sorts; I'd like to offer a suggestion
and a contribution to satisfy that suggestion, as well as to ask a question.
The context is in regard to why the ORC types of CHAR and VARCHAR are processed
differently from that of STRING. I'm guessing that there was a reason, but not
certain as to what that reason might be.
The specific area that I am addressing is in regard to the maxLength attribute
of the TypeImpl class. With CHAR and VARCHAR, a user can define this maxLength
attribute but with STRING they cannot. Granted, there is a "convenience
method" if you will for only the CHAR class, thus:
ORC_UNIQUE_PTR<Type> createCharType(TypeKind kind,
uint64_t maxLength);
In my lil' test program, I used this like so:
container->addStructField( std::string( "char column" ), createCharType(
orc::TypeKind::CHAR, 20 ) );
So at a minimum it would seem that there should be an equivalent for the
VARCHAR type. However I was able to "get crafty" and create one via the
following:
container->addStructField( std::string( "varchar column" ),
std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::VARCHAR, 20)));
And both of these would produce a type of either char(20) or varchar(20) and
the getMaximumLength() method would return a value of 20 as well.
However, none of this works for the STRING type. As with VARCHAR, there is no
"convenience method" and a similar attempt to that of the varchar shown above,
thus:
container->addStructField( std::string( "string column" ),
std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::STRING, 20)));
failed to produce the result I would have expected. It was easy to see why the
output type was just "string", that is readily seen in the toString() method.
However I was a bit surprised to see that getMaximumLength returned 0 when I
used the second variant of the TypeImpl constructor, ergo the one that has the
maxLength set via the second parm.
Unfortunately I didn't have time to dig into why that was happening, but I'd
seen enough to warrant an issue report, albeit not of critical importance.
All that said, as a user of ORC, I'd like to see the STRING type handled in the
same manner as the CHAR or VARCHAR type, with convenience methods for both, as
there is for CHAR. Or at least learn why there is only the one convenience
method and why STRING is treated so differently. We could use this
functionality in our project (in which we use ORC), and this is the reason I am
opening the issue ticket in the first place.
I'd be willing to contribute the fix, as it seems easy enough to do. But I'll
leave that up to Owen or other project folk to decide.
Thanks,
Jerry
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)