[ 
https://issues.apache.org/jira/browse/NIFI-13433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew M. Lim updated NIFI-13433:
---------------------------------
    Description: 
I have a ChunkDocument processor connected to PutChroma. The FlowFile contents 
from Chunk Document is as follows:

{
  "text" : "Title 1",
  "metadata" : {
    "languages" : [ "Eng" ],
    "page_number" : 1,
    "filetype" : "application/pdf",
    "category" : "Title",
    "filename" : "pdf/PDF-test.pdf",
    "uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
    "chunk_index" : 0,
    "chunk_count" : 3,
    "parent_id" : null
  }
}

Running PutChroma gives the following error, because Chroma expects a str, not 
a list for languages:

PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b] Failed 
to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException: An 
exception was raised by the Python Proxy. Return Message: Traceback (most 
recent call last):
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
 line 2466, in _call_proxy
    return_value = getattr(self.pool[obj_id], method)(*params)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
 line 33, in transformFlowFile
    return self.transform(self.process_context, flowfile)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
 line 123, in transform
    collection.upsert(ids, embeddings, metadatas, texts)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
 line 477, in upsert
    ) = self._validate_embedding_set(
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
 line 554, in _validate_embedding_set
    validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
 line 291, in validate_metadatas
    validate_metadata(metadata)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
 line 259, in validate_metadata
    raise ValueError(
ValueError: Expected metadata value to be a str, int, float or bool, got 
['Eng'] which is a <class 'list'>

  was:
I have a ChunkDocument processor connected to PutChroma. The FlowFile contents 
from Chunk Document is as follows:

{{{
  "text" : "Title 1",
  "metadata" : {
    "languages" : [ "Eng" ],
    "page_number" : 1,
    "filetype" : "application/pdf",
    "category" : "Title",
    "filename" : "pdf/PDF-test.pdf",
    "uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
    "chunk_index" : 0,
    "chunk_count" : 3,
    "parent_id" : null
  }
}
}}
Running PutChroma gives the following error, because Chroma expects a str, not 
a list for languages:

{{PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b] 
Failed to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException: An 
exception was raised by the Python Proxy. Return Message: Traceback (most 
recent call last):
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
 line 2466, in _call_proxy
    return_value = getattr(self.pool[obj_id], method)(*params)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
 line 33, in transformFlowFile
    return self.transform(self.process_context, flowfile)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
 line 123, in transform
    collection.upsert(ids, embeddings, metadatas, texts)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
 line 477, in upsert
    ) = self._validate_embedding_set(
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
 line 554, in _validate_embedding_set
    validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
 line 291, in validate_metadatas
    validate_metadata(metadata)
  File 
"/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
 line 259, in validate_metadata
    raise ValueError(
ValueError: Expected metadata value to be a str, int, float or bool, got 
['Eng'] which is a <class 'list'>}}


> PutChroma processor has errors when "languages" is a list
> ---------------------------------------------------------
>
>                 Key: NIFI-13433
>                 URL: https://issues.apache.org/jira/browse/NIFI-13433
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Andrew M. Lim
>            Priority: Major
>
> I have a ChunkDocument processor connected to PutChroma. The FlowFile 
> contents from Chunk Document is as follows:
> {
>   "text" : "Title 1",
>   "metadata" : {
>     "languages" : [ "Eng" ],
>     "page_number" : 1,
>     "filetype" : "application/pdf",
>     "category" : "Title",
>     "filename" : "pdf/PDF-test.pdf",
>     "uuid" : "eeb8441d-0e4b-4d24-bd93-78e8f352306b",
>     "chunk_index" : 0,
>     "chunk_count" : 3,
>     "parent_id" : null
>   }
> }
> Running PutChroma gives the following error, because Chroma expects a str, 
> not a list for languages:
> PythonProcessor[type=PutChroma, id=b65634ca-0211-30e4-0c1b-49776e379b4b] 
> Failed to transform FlowFile[filename=pdf/PDF-test.pdf]: py4j.Py4JException: 
> An exception was raised by the Python Proxy. Return Message: Traceback (most 
> recent call last):
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/framework/py4j/java_gateway.py",
>  line 2466, in _call_proxy
>     return_value = getattr(self.pool[obj_id], method)(*params)
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/api/nifiapi/flowfiletransform.py",
>  line 33, in transformFlowFile
>     return self.transform(self.process_context, flowfile)
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/python/extensions/vectorstores/PutChroma.py",
>  line 123, in transform
>     collection.upsert(ids, embeddings, metadatas, texts)
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
>  line 477, in upsert
>     ) = self._validate_embedding_set(
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/models/Collection.py",
>  line 554, in _validate_embedding_set
>     validate_metadatas(maybe_cast_one_to_many_metadata(metadatas))
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
>  line 291, in validate_metadatas
>     validate_metadata(metadata)
>   File 
> "/Users/andrew.lim/Downloads/nifi-2.0.0-M3/./work/python/extensions/PutChroma/2.0.0-M3/chromadb/api/types.py",
>  line 259, in validate_metadata
>     raise ValueError(
> ValueError: Expected metadata value to be a str, int, float or bool, got 
> ['Eng'] which is a <class 'list'>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to