[ 
https://issues.apache.org/jira/browse/IMPALA-14566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18041150#comment-18041150
 ] 

Raghav Jindal commented on IMPALA-14566:
----------------------------------------

I tried to push my code and create a pvt branch but I am not having the 
permissions 
{code:java}
git push origin semanticsearchtest
remote: Permission to apache/impala.git denied to rajindal8.
fatal: unable to access 'https://github.com/apache/impala.git/': The requested 
URL returned error: 403 {code}

Initial Code for vector-functions.h 
{code:java}
#ifndef IMPALA_EXPRS_VECTOR_FUNCTIONS_H#define IMPALA_EXPRS_VECTOR_FUNCTIONS_H
#include "udf/udf.h"
namespace impala {
using impala_udf::FunctionContext;using impala_udf::DoubleVal;using 
impala_udf::CollectionVal;
class VectorFunctions { public:  /// The Return Type for the below distance 
functions DOUBLE and I did not use FLOAT because of better precision.   // 
Distance calculations usually involve square roots which will benefit from 
15-17 digit precision in Doubke vs 7 digits in FLOAT.  // Value returned from 
this Euclidean distance function is either a DOUBLE, or NULL if inputs are 
invalid  /// ctx is a  Function context for memory allocation and error 
reporting  /// vec1 is the First vector as ARRAY<FLOAT>  /// vec2 is the Second 
vector as ARRAY<FLOAT>  static DoubleVal EuclideanDistance(FunctionContext* 
ctx,      const CollectionVal& vec1, const CollectionVal& vec2);
  static DoubleVal CosineSimilarity(FunctionContext* ctx,      const 
CollectionVal& vec1, const CollectionVal& vec2);
  /// Prepare function to initialize the function state.  static void 
VectorDistancePrepare(FunctionContext* ctx,      
FunctionContext::FunctionStateScope scope);
  /// Close function to clean up the function state.  static void 
VectorDistanceClose(FunctionContext* ctx,      
FunctionContext::FunctionStateScope scope);
 private:  /// Declaring the Helper functions under private to get a float 
value from an array element.  /// For ARRAY<FLOAT>, elements are stored as 
tuples and this function will extract  /// the float value from the tuple at a 
given index.  /// array_ptr Pointer to the start of the array tuple data  /// 
index Index of the element to retrieve  /// tuple_size Size of each tuple in 
bytes  /// slot_offset Offset of the float slot within the tuple  /// The float 
value, or 0.0 if the element is NULL  static float GetFloatFromArray(const 
uint8_t* array_ptr, int index,      int tuple_size, int slot_offset);
  /// Helper function to check if an array element is NULL.  static bool 
IsArrayElementNull(const uint8_t* array_ptr, int index,      int tuple_size, 
int null_indicator_offset);};
} // namespace impala
#endif // IMPALA_EXPRS_VECTOR_FUNCTIONS_H {code}
 

 

> Add support for cosine similarity function
> ------------------------------------------
>
>                 Key: IMPALA-14566
>                 URL: https://issues.apache.org/jira/browse/IMPALA-14566
>             Project: IMPALA
>          Issue Type: Task
>            Reporter: Abhishek Rawat
>            Assignee: Raghav Jindal
>            Priority: Major
>
> The cosine similarity function measures the angle between two vectors, 
> regardless of their length (magnitude). The use cases include measuring text 
> similarity and is ideal when the direction (semantic meaning/concept) is more 
> important than the magnitude.
> Impala doesn't support a native vector data type yet, so we could possibly 
> use an ARRAY<FLOAT> data type for representing vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to