[ 
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908344#comment-13908344
 ] 

Bruno Dumon commented on PHOENIX-66:
------------------------------------

Sounds good. So if I got that right, the start and end of the array would 
simply be detected based on having a value starting with [ and one ending with 
].

One disadvantage is that it looks a bit strange when quoting is enabled:

"a", "b", "c", "\[foo", "1", "bar\]", "d"

If the array would happen to consist of a single element, brackets wouldn't be 
necessary.

In case the 4'th element wouldn't be declared as an array, it would be valid to 
have just a [ in the value, which might be confusing as well:

"a", "b", "c", "\[foo", "d"

I'm don't think any of these are real problems though, and I don't have any 
other proposal. We're probably stretching the limits of what CSV can do.

I might work on this -- I'll let you know once I get to it.

> Support array creation from CSV file
> ------------------------------------
>
>                 Key: PHOENIX-66
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-66
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>             Fix For: 3.0.0
>
>
> We should support being able to parse an array defined in our CVS file. 
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the 
> fourth field here.
> One option to support this would be to implement the 
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, 
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since 
> we'd need to in turn parse each element. Also, we don't have a way to pass 
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to 
> look at the column ARRAY_SIZE to control how many of the next CSV columns 
> should be used as array elements. In this approach, you wouldn't use the 
> square brackets at all. You can get the ARRAY_SIZE from the column metadata 
> through connection.getMetaData().getColumns() call, through 
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL 
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We 
> could use the above bracket syntax, and then collect up the next set of CSV 
> field elements until we hit the unescaped ']'. Then we'd use our standard 
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack 
> at it?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to