[
https://issues.apache.org/jira/browse/HIVE-26294?focusedWorklogId=792415&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-792415
]
ASF GitHub Bot logged work on HIVE-26294:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Jul/22 02:30
Start Date: 19/Jul/22 02:30
Worklog Time Spent: 10m
Work Description: kasakrisz commented on code in PR #3433:
URL: https://github.com/apache/hive/pull/3433#discussion_r923998397
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -64,6 +65,27 @@ public UDFSubstr() {
r = new Text();
}
+ public Text evaluate(Text t, LongWritable pos, LongWritable len) {
+
+ if ((t == null) || (pos == null) || (len == null)) {
+ return null;
+ }
+
+ r.clear();
+ if ((len.get() <= 0)) {
+ return r;
+ }
+
+ String s = t.toString();
+ int[] index = makeIndex(pos.get(), len.get(), s.length());
+ if (index == null) {
+ return r;
+ }
+
+ r.set(s.substring(index[0], index[1]));
+ return r;
+ }
Review Comment:
Java String accepts int parameters only
https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#substring-int-int-
How about creating a common `evaluate` which receives int parameters and
both the `LongWritable` and `IntWritable` version call it?
```
public Text evaluate(Text t, IntWritable pos, IntWritable len) {
if ((t == null) || (pos == null) || (len == null)) {
return null;
}
return evaluate(t, pos.get(), len.get());
}
public Text evaluate(Text t, LongWritable pos, LongWritable len) {
if ((t == null) || (pos == null) || (len == null)) {
return null;
}
// Handle long int overflow
return evaluate(t, (int)pos.get(), (int)len.get());
}
private Text evaluate(Text t, int pos, int len) {
...
}
```
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -85,6 +107,31 @@ public Text evaluate(Text t, IntWritable pos, IntWritable
len) {
return r;
}
+ private int[] makeIndex(long pos, long len, int inputLen) {
+ if ((Math.abs(pos) > inputLen)) {
+ return null;
+ }
+
+ long start, end;
+
+ if (pos > 0) {
+ start = pos - 1;
+ } else if (pos < 0) {
+ start = inputLen + pos;
+ } else {
+ start = 0;
+ }
+
+ if ((inputLen - start) < len) {
+ end = inputLen;
+ } else {
+ end = start + len;
+ }
+ index[0] = (int) start;
+ index[1] = (int) end;
+ return index;
+ }
Review Comment:
This is unnecessary if both `LongWritable` and `IntWritable` `evaluate` call
the common one. See my previous comment.
##########
ql/src/test/queries/clientpositive/udf_substr.q:
##########
@@ -76,3 +76,8 @@ SELECT
substr("abc 玩玩玩 abc", 5),
substr("abc 玩玩玩 abc", 5, 3)
FROM src tablesample (1 rows);
+
+SELECT
+ substr('ABC', cast(1 as bigint), cast(0 as bigint)),
+ substr('ABC', cast(4 as bigint))
+FROM src tablesample (1 rows);
Review Comment:
Could you please add a test when one of the long params has a value which
doesn't falls into the int range? How should we handle that?
##########
ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java:
##########
@@ -112,10 +159,34 @@ private int[] makeIndex(int pos, int len, int inputLen) {
private final IntWritable maxValue = new IntWritable(Integer.MAX_VALUE);
+ private final LongWritable maxLongValue = new LongWritable(Long.MAX_VALUE);
+
public Text evaluate(Text s, IntWritable pos) {
return evaluate(s, pos, maxValue);
}
+ public Text evaluate(Text s, LongWritable pos) {
+ return evaluate(s, pos, maxLongValue);
+ }
+
+ public BytesWritable evaluate(BytesWritable bw, LongWritable pos,
LongWritable len) {
+
+ if ((bw == null) || (pos == null) || (len == null)) {
+ return null;
+ }
+
+ if ((len.get() <= 0)) {
+ return new BytesWritable();
+ }
+
+ int[] index = makeIndex(pos.get(), len.get(), bw.getLength());
+ if (index == null) {
+ return new BytesWritable();
+ }
+
+ return new BytesWritable(Arrays.copyOfRange(bw.getBytes(), index[0],
index[1]));
+ }
Review Comment:
The pattern for the `Text` version of the `evaluate` can be applied here.
Issue Time Tracking
-------------------
Worklog Id: (was: 792415)
Time Spent: 40m (was: 0.5h)
> Allow substr to take bigint as parameters
> -----------------------------------------
>
> Key: HIVE-26294
> URL: https://issues.apache.org/jira/browse/HIVE-26294
> Project: Hive
> Issue Type: Improvement
> Components: Types
> Reporter: Steve Carlin
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Small enhancement
> Impala allows a bigint as an argument for the substr function. We should
> allow Hive to allow bigint arguments too.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)