[ 
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated PHOENIX-66:
--------------------------------

    Attachment: PHOENIX-66.patch

Here's the patch to add array support to the CSV import. As mentioned 
previously, it separates out the conversion of CSV to upserts to make it 
reusable the MR-based import, as well as adding a bit more testing for a number 
of things.

I abandoned the changes to the MR-based import for this ticket; as I looked 
into adding some automated tests and other general improvements in the MR 
importer to allow handling arrays, it became apparent that it was a pretty 
involved issue on its own, so I've made a separate ticket (PHOENIX-129) to 
handle that, and I'll try to get that patch posted in the next couple of days, 
if not sooner.

> Support array creation from CSV file
> ------------------------------------
>
>                 Key: PHOENIX-66
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-66
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>             Fix For: 3.0.0
>
>         Attachments: PHOENIX-66-intermediate.patch, PHOENIX-66.patch
>
>
> We should support being able to parse an array defined in our CVS file. 
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the 
> fourth field here.
> One option to support this would be to implement the 
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, 
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since 
> we'd need to in turn parse each element. Also, we don't have a way to pass 
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to 
> look at the column ARRAY_SIZE to control how many of the next CSV columns 
> should be used as array elements. In this approach, you wouldn't use the 
> square brackets at all. You can get the ARRAY_SIZE from the column metadata 
> through connection.getMetaData().getColumns() call, through 
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL 
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We 
> could use the above bracket syntax, and then collect up the next set of CSV 
> field elements until we hit the unescaped ']'. Then we'd use our standard 
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack 
> at it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to