[
https://issues.apache.org/jira/browse/HIVE-24855?focusedWorklogId=570100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570100
]
ASF GitHub Bot logged work on HIVE-24855:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 22/Mar/21 22:19
Start Date: 22/Mar/21 22:19
Worklog Time Spent: 10m
Work Description: jcamachor commented on a change in pull request #2046:
URL: https://github.com/apache/hive/pull/2046#discussion_r599094572
##########
File path:
llap-server/src/java/org/apache/hadoop/hive/llap/io/api/impl/LlapInputFormat.java
##########
@@ -182,11 +190,11 @@ static VectorizedRowBatchCtx createFakeVrbCtx(MapWork
mapWork) throws HiveExcept
RowSchema rowSchema = findTsOp(mapWork).getSchema();
final List<String> colNames = new
ArrayList<String>(rowSchema.getSignature().size());
final List<TypeInfo> colTypes = new
ArrayList<TypeInfo>(rowSchema.getSignature().size());
- boolean hasRowId = false;
+ ArrayList<VirtualColumn> virtualColumnList = new ArrayList<>(2);
for (ColumnInfo c : rowSchema.getSignature()) {
String columnName = c.getInternalName();
- if (VirtualColumn.ROWID.getName().equals(columnName)) {
- hasRowId = true;
+ if (ALLOWED_VIRTUAL_COLUMNS.containsKey(columnName)) {
+ virtualColumnList.add(ALLOWED_VIRTUAL_COLUMNS.get(columnName));
} else {
if (VirtualColumn.VIRTUAL_COLUMN_NAMES.contains(columnName)) continue;
Review comment:
Not really part of this patch, but this line can be merge with previous
one.
##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -4450,6 +4450,8 @@ private static void populateLlapDaemonVarsSet(Set<String>
llapDaemonVarsSetLocal
HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true,
"Enable writing the data files directly to the table's final
destination instead of the staging directory."
+ "This optimization only applies on INSERT operations on ACID
tables."),
+ HIVE_ACID_FETCH_DELETED_ROWS("hive.acid.fetch.deleted.rows", false,
Review comment:
I am not sure this should be part of HiveConf.java since it is not a
global/session config per se? Have you considered introducing it in
`org.apache.hadoop.hive.conf.Constants`? We have introduced some of these
config variables there in the past.
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
##########
@@ -1970,8 +1970,22 @@ private int processQueryHint(ASTNode ast, QBParseInfo
qbp, int posn) throws Sema
String queryHintStr = ast.getText();
LOG.debug("QUERY HINT: {} ", queryHintStr);
try {
- ASTNode hintNode = pd.parseHint(queryHintStr);
- qbp.setHints(hintNode);
+ ASTNode hintListNode = pd.parseHint(queryHintStr);
+ qbp.setHints(hintListNode);
+ for (int i = 0; i < hintListNode.getChildCount(); ++i) {
+ ASTNode hintNode = (ASTNode) hintListNode.getChild(i);
+ if (hintNode.getChild(0).getType() !=
HintParser.TOK_FETCH_DELETED_ROWS) {
Review comment:
Should this be a table property rather than a hint, i.e., value for
`hive.acid.fetch.deleted.rows` passed as a property for the table scan? That
would use a similar pattern to what we do in other code paths in Hive, e.g., to
push computation for Druid/JDBC. It also means we do not need to introduce a
new hint so it will simplify the patch.
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##########
@@ -1943,15 +1943,17 @@ public static boolean isFullAcidScan(Configuration
conf) {
* we assume this is a full transactional table.
*/
public static void setAcidOperationalProperties(
- Configuration conf, boolean isTxnTable, AcidOperationalProperties
properties) {
+ Configuration conf, boolean isTxnTable, AcidOperationalProperties
properties, boolean fetchDeletedRows) {
Review comment:
Since in most cases, we would pass false for the last parameter, should
we keep the old method signature too, which would call this method with `false`
value for the last parameter?
##########
File path: ql/src/java/org/apache/hadoop/hive/ql/io/BatchToRowReader.java
##########
@@ -108,25 +108,41 @@ public BatchToRowReader(RecordReader<NullWritable,
VectorizedRowBatch> vrbReader
} else {
Arrays.fill(included, true);
}
- // Create struct for ROW__ID virtual column and extract index
- this.rowIdIdx = vrbCtx.findVirtualColumnNum(VirtualColumn.ROWID);
- if (this.rowIdIdx >= 0) {
- included[rowIdIdx] = true;
+
+ virtualColumnHandlers = requestedVirtualColumns();
+ for (VirtualColumnHandler handler : virtualColumnHandlers) {
+ int idx = vrbCtx.findVirtualColumnNum(handler.virtualColumn);
+ if (idx >= 0) {
+ included[idx] = true;
+ handler.indexInSchema = idx;
+ }
}
+
if (LOG.isDebugEnabled()) {
LOG.debug("Including the columns " + DebugUtils.toString(included));
}
this.included = included;
}
+ public static class VirtualColumnHandler {
Review comment:
Can you add a short comment for the inner class?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 570100)
Time Spent: 20m (was: 10m)
> Introduce virtual colum ROW__IS__DELETED
> ----------------------------------------
>
> Key: HIVE-24855
> URL: https://issues.apache.org/jira/browse/HIVE-24855
> Project: Hive
> Issue Type: New Feature
> Reporter: Krisztian Kasa
> Assignee: Krisztian Kasa
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)