My views have changed over time on syntax and I feel type[dimention] may not be 
the best, so it has gone lower in my own personal ranking… this is my current 
preference

1) DENSE <type>[dimention] | NON NULL <type>[dimention]
2) VECTOR<type, dimention>
3) type[dimention]

My reasoning for this order

* type[dimention] looks like syntax sugar for array<type, dimention>, so users 
may assume list/array semantics, but we limit to non-null elements in a frozen 
array
* feel VECTOR as a prefix feels out of place, but VECTOR as a direct type makes 
more sense… this also leads to a possible future of VECTOR<type> which is the 
non-fixed length version of this type.  What makes VECTOR different from 
list/array?  non-null elements and is frozen.  I don’t feel that VECTOR really 
tells users to expect non-null or frozen semantics, as there exists different 
VECTOR types for those reasons (sparse vs dense)… 
* DENSE may be confusing for people coming from languages where this just means 
“sequential layout”, which is what our frozen array/list already are… but since 
the target user is coming from a ML background, this shouldn’t offer much 
confusion.  DENSE just means FROZEN in Cassandra, with NON NULL elements 
(SPARSE allows for NULL and isn’t frozen)… So DENSE just acts as syntax sugar 
for frozen<non null type[dimention]>


> On May 4, 2023, at 4:13 AM, Brandon Williams <dri...@gmail.com> wrote:
> 
> 1. VECTOR<FLOAT,n>
> 2. VECTOR FLOAT[n]
> 3. FLOAT[N]   (Non null by default)
> 
> Redundant or not, I think having the VECTOR keyword helps signify what
> the app is generally about and helps get buy-in from ML stakeholders.
> 
> On Thu, May 4, 2023 at 3:45 AM Benedict <bened...@apache.org> wrote:
>> 
>> Hurrah for initial agreement.
>> 
>> For syntax, I think one option was just FLOAT[N]. In VECTOR FLOAT[N], VECTOR 
>> is redundant - FLOAT[N] is fully descriptive by itself. I don’t think VECTOR 
>> should be used to simply imply non-null, as this would be very unintuitive. 
>> More logical would be NONNULL, if this is the only condition being applied. 
>> Alternatively for arrays we could default to NONNULL and later introduce 
>> NULLABLE if we want to permit nulls.
>> 
>> If the word vector is to be used it makes more sense to make it look like a 
>> list, so VECTOR<FLOAT, N> as here the word VECTOR is clearly not redundant.
>> 
>> So, I vote:
>> 
>> 1) (NON NULL) FLOAT[N]
>> 2) FLOAT[N]   (Non null by default)
>> 3) VECTOR<FLOAT, N>
>> 
>> 
>> 
>> On 4 May 2023, at 08:52, Mick Semb Wever <m...@apache.org> wrote:
>> 
>> 
>>> 
>>> Did we agree on a CQL syntax?
>>> 
>>> I don’t believe there has been a pool on CQL syntax… my understanding 
>>> reading all the threads is that there are ~4-5 options and non are -1ed, so 
>>> believe we are waiting for majority rule on this?
>> 
>> 
>> 
>> Re-reading that thread, IIUC the valid choices remaining are…
>> 
>> 1. VECTOR FLOAT[n]
>> 2. FLOAT VECTOR[n]
>> 3. VECTOR<FLOAT,n>
>> 4. VECTOR[n]<FLOAT>
>> 5. ARRAY<FLOAT, n>
>> 6. NON-NULL FROZEN<FLOAT[n]>
>> 
>> 
>> Yes I'm putting my preference (1) first ;) because (banging on) if the 
>> future of CQL will have FLOAT[n] and FROZEN<FLOAT[n]>, where the VECTOR 
>> keyword is: for general cql users; just meaning "non-null and frozen", these 
>> gel best together.
>> 
>> Options (5) and (6) are for those that feel we can and should provide this 
>> type without introducing the vector keyword.
>> 
>> 

Reply via email to