Hisoka-X commented on issue #9716: URL: https://github.com/apache/seatunnel/issues/9716#issuecomment-3242536822
> @Hisoka-X For this task, we have previously implemented a processing program for obtaining text through pdf parsing. Then we will perform sharding, embedding and writing to the vector database based on the obtained text for RAG. Is the process to be implemented this time also similar to this? I want to try to accomplish this task. Yes, this is the feature we also want seatunnel can do. > Also, I would like to know what function Normalization aims to achieve? Make sure markdown/pdf/words return same CatalogTable. Please refer https://github.com/apache/seatunnel/pull/9760#discussion_r2306274831 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
