[ 
https://issues.apache.org/jira/browse/ORC-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated ORC-541:
------------------------------
    Fix Version/s:     (was: 1.5.7)

> Extend CHAR behavior to STRING
> ------------------------------
>
>                 Key: ORC-541
>                 URL: https://issues.apache.org/jira/browse/ORC-541
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 1.5.6
>            Reporter: Jerry Adair
>            Priority: Minor
>
> This issue is a dual-purpose animal of sorts; I'd like to offer a suggestion 
> and a contribution to satisfy that suggestion, as well as to ask a question.  
> The context is in regard to why the ORC types of CHAR and VARCHAR are 
> processed differently from that of STRING.  I'm guessing that there was a 
> reason, but not certain as to what that reason might be.
>  
> The specific area that I am addressing is in regard to the maxLength 
> attribute of the TypeImpl class.  With CHAR and VARCHAR, a user can define 
> this maxLength attribute but with STRING they cannot.  Granted, there is a 
> "convenience method" if you will for only the CHAR class, thus:
>  ORC_UNIQUE_PTR<Type> createCharType(TypeKind kind,
>  uint64_t maxLength);
> In my lil' test program, I used this like so:
> container->addStructField( std::string( "char column" ), createCharType( 
> orc::TypeKind::CHAR, 20 ) );
>  
> So at a minimum it would seem that there should be an equivalent for the 
> VARCHAR type.  However I was able to "get crafty" and create one via the 
> following:
> container->addStructField( std::string( "varchar column" ), 
> std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::VARCHAR, 20)));
>  
> And both of these would produce a type of either char(20) or varchar(20) and 
> the getMaximumLength() method would return a value of 20 as well.
>  
> However, none of this works for the STRING type.  As with VARCHAR, there is 
> no "convenience method" and a similar attempt to that of the varchar shown 
> above, thus:
> container->addStructField( std::string( "string column" ), 
> std::unique_ptr<Type>(new TypeImpl(orc::TypeKind::STRING, 20)));
> failed to produce the result I would have expected.  It was easy to see why 
> the output type was just "string", that is readily seen in the toString() 
> method.  However I was a bit surprised to see that getMaximumLength returned 
> 0 when I used the second variant of the TypeImpl constructor, ergo the one 
> that has the maxLength set via the second parm.
>  
> Unfortunately I didn't have time to dig into why that was happening, but I'd 
> seen enough to warrant an issue report, albeit not of critical importance.
>  
> All that said, as a user of ORC, I'd like to see the STRING type handled in 
> the same manner as the CHAR or VARCHAR type, with convenience methods for 
> both, as there is for CHAR.  Or at least learn why there is only the one 
> convenience method and why STRING is treated so differently.  We could use 
> this functionality in our project (in which we use ORC), and this is the 
> reason I am opening the issue ticket in the first place.
>  
> I'd be willing to contribute the fix, as it seems easy enough to do.  But 
> I'll leave that up to Owen or other project folk to decide.
>  
> Thanks,
> Jerry



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to