[ https://issues.apache.org/jira/browse/SQOOP-3267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337301#comment-16337301 ]
Szabolcs Vasas commented on SQOOP-3267: --------------------------------------- *re: "or every column, but I've already addressed this issue in [^SQOOP-3267.1.patch] (see first comment on this issue)."* Sorry, I have missed this, it is a nice improvement! Even if we ignore the slight performance overhead the problem with the default null string could be that the output HBase table of a regular import would be different (we would get defined columns with empty strings instead of undefined columns) and this behavior change is a bit unexpected from a bug JIRA. It would solve this particular bug but could lead to confusion in the future. I am not sure I understand how you would split up the work between the two JIRAs and I wasn't really clear in my previous comment so let me summarize what I suggest: * This JIRA would add the --hbase-null-incremental-mode option with two possible values: ignore(default) and delete. This would basically restore the behavior we had prior to SQOOP-3149 but it would keep the intended functionality introduced by it. It would be a pretty much localized change we would not affect users who do not even do incremental imports. * Another JIRA would introduce a new possible value (null-string) to --hbase-null-incremental-mode and a new option --hbase-null-string to specify its value. I think this change should be classified as a new feature. --hbase-null-string could be usable with regular imports too, but if the user does not specify it we should stick to the current behavior and not insert any null string to the columns which have nulls in the RDBMS. > Incremental import to HBase deletes only last version of column > --------------------------------------------------------------- > > Key: SQOOP-3267 > URL: https://issues.apache.org/jira/browse/SQOOP-3267 > Project: Sqoop > Issue Type: Bug > Components: hbase-integration > Affects Versions: 1.4.7 > Reporter: Daniel Voros > Assignee: Daniel Voros > Priority: Major > Attachments: SQOOP-3267.1.patch > > > Deletes are supported since SQOOP-3149, but we're only deleting the last > version of a column when the corresponding cell was set to NULL in the source > table. > This can lead to unexpected and misleading results if the row has been > transferred multiple times, which can easily happen if it's being modified on > the source side. > Also SQOOP-3149 is using a new Put command for every column instead of a > single Put per row as before. This could probably lead to a performance drop > for wide tables (for which HBase is otherwise usually recommended). > [~jilani], [~anna.szonyi] could you please comment on what you think would be > the expected behavior here? -- This message was sent by Atlassian JIRA (v7.6.3#76005)