Hi everyone, I am currently looking into *ASTERIXDB-3745* and would like to draft a PR for this issue to familiarize myself with the index training dataflow ahead of GSoC 2026. Given that my proposal also focuses on vector queries, familiarizing myself with the vctree-index Training data flow is a perfect stepping stone.
*My understanding of the problem:* When a vector index (vctree-index) is being trained, the system takes a sample of the records. Because AsterixDB supports Open Types, this random sample might pick up records where the vector field is missing, null, or has a different dimensionality than what the index definition requires. Handing this heterogeneous sample to the math algorithms causes the training to fail. *My proposed approach:* I plan to implement a zero-copy validation filter during the dataflow sampling phase. Before a tuple is accepted into the reservoir, the logic will: 1. Check the "ATypeTag" at the "fieldStartOffset" to ensure the vector is not null or missing. 2. Use getFieldLength() to mathematically verify that the byte size of the field perfectly matches the expected dimensionality of the index (accounting for the type tag and array headers). Any tuple failing these checks will simply be skipped by the sampler. *Where I am currently blocked: *I have my local environment built and have been trying to trace the execution path. I tried searching for keywords like train_list and ReservoirSample within the Hyracks and compiler layers, but I couldn't pinpoint the specific file. Could you please point me to the right Java class or package to look for? *(Note: I am reaching out via email because my public Jira account request is still pending approval, so I cannot comment directly on the ticket yet)* Thank you, Tejesh <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free.www.avast.com <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
