[ 
https://issues.apache.org/jira/browse/DRILL-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17623085#comment-17623085
 ] 

ASF GitHub Bot commented on DRILL-8340:
---------------------------------------

jnturton commented on code in PR #2689:
URL: https://github.com/apache/drill/pull/2689#discussion_r1003065011


##########
contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestDateUtils.java:
##########
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.udfs;
+
+import org.junit.Test;
+
+import java.time.LocalDate;
+import java.time.LocalDateTime;
+
+import static org.junit.Assert.assertEquals;
+
+public class TestDateUtils {
+
+  @Test
+  public void testDateFromString() {
+    LocalDate testDate = LocalDate.of(2022, 3,14);
+    LocalDate badDate = LocalDate.of(1970, 1, 1);
+    assertEquals(testDate, DateUtilFunctions.getDateFromString("2022-03-14"));
+    assertEquals(testDate, DateUtilFunctions.getDateFromString("3/14/2022"));
+    assertEquals(testDate, DateUtilFunctions.getDateFromString("14/03/2022", 
true));
+    assertEquals(testDate, DateUtilFunctions.getDateFromString("2022/3/14"));
+
+    // Test bad dates
+    assertEquals(badDate, DateUtilFunctions.getDateFromString(null));
+    assertEquals(badDate, DateUtilFunctions.getDateFromString("1975-13-56"));
+    assertEquals(badDate, DateUtilFunctions.getDateFromString("1975-1s"));

Review Comment:
   Footnote, since this case comes up quite often. Those of us who've done data 
analytics with Pandas _did_ get used to floating point NaN being used as a 
"sentinel" for missing or invalid data but we should recognise this for what it 
was: a performance hack that entered Pandas from its Numpy foundation. It 
resulted in pain: automatic casting of ints to floats even though precision 
loss could happen because hardware and C integer types have no NaN value, 
special code to implement `skipna` for making NaNs behave like nulls rather 
than according to the IEEE float rules, etc.
   
   A one-line Drill query reveals the square peg and round hole relationship 
between IEEE 754 NaN and ANSI SQL null.
   ```
   apache drill> select cast('NaN' as float) = cast('NaN' as float), null = 
null;
   EXPR$0  true
   EXPR$1  null
   ```
   
   And indeed, [here is Wes McKinney talking about moving away from that 
approach](https://wesmckinney.com/blog/bitmaps-vs-sentinel-values/)[1]. Here's 
a relevant excerpt for us, one hop over in the SQL world where null is a 
standardised first class citizen.
   
   > From the perspective of databases and data warehousing, reserving certain 
values to mark a null (or NA) is widely considered unacceptable. NaN is valid 
data, as is INT32_MIN and other common values used as sentinels.
   
   [1] Also see 
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-na





> Add Additional Date Manipulation Functions (Part 1)
> ---------------------------------------------------
>
>                 Key: DRILL-8340
>                 URL: https://issues.apache.org/jira/browse/DRILL-8340
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 1.20.2
>            Reporter: Charles Givre
>            Assignee: Charles Givre
>            Priority: Major
>             Fix For: 2.0.0
>
>
> This PR adds several utility functions to facilitate working with dates and 
> times.  These are modeled after the date/time functionality in MySQL.
> Specifically this adds:
>  * YEARWEEK(<date>):  Returns an int of year week. IE (202002)
>  * TIME_STAMP(<date string>):  Converts most anything that looks like a date 
> string into a timestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to