[
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908344#comment-13908344
]
Bruno Dumon commented on PHOENIX-66:
------------------------------------
Sounds good. So if I got that right, the start and end of the array would
simply be detected based on having a value starting with [ and one ending with
].
One disadvantage is that it looks a bit strange when quoting is enabled:
"a", "b", "c", "\[foo", "1", "bar\]", "d"
If the array would happen to consist of a single element, brackets wouldn't be
necessary.
In case the 4'th element wouldn't be declared as an array, it would be valid to
have just a [ in the value, which might be confusing as well:
"a", "b", "c", "\[foo", "d"
I'm don't think any of these are real problems though, and I don't have any
other proposal. We're probably stretching the limits of what CSV can do.
I might work on this -- I'll let you know once I get to it.
> Support array creation from CSV file
> ------------------------------------
>
> Key: PHOENIX-66
> URL: https://issues.apache.org/jira/browse/PHOENIX-66
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Fix For: 3.0.0
>
>
> We should support being able to parse an array defined in our CVS file.
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the
> fourth field here.
> One option to support this would be to implement the
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal,
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since
> we'd need to in turn parse each element. Also, we don't have a way to pass
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to
> look at the column ARRAY_SIZE to control how many of the next CSV columns
> should be used as array elements. In this approach, you wouldn't use the
> square brackets at all. You can get the ARRAY_SIZE from the column metadata
> through connection.getMetaData().getColumns() call, through
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We
> could use the above bracket syntax, and then collect up the next set of CSV
> field elements until we hit the unescaped ']'. Then we'd use our standard
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack
> at it?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)