Hi All Kudu developers,
I have been using kudu in projects. It’s been amazing. A few projects have recently posted requirements on how to use Kudu store large binary files (images, documents, etc). We used to propose Kudu + HDFS (or other file system before) as a workaround but it is really a good solution. The main scenario of the needs are 1). Use Kudu as the only storage layer. As we are storing larger amount of data and growing the kudu cluster, the kudu cluster should support both structured and unstructured data to avoid managing another storage tier for images or documents. 2). It’s be great to simply the architecture from business application point of view to have a single data access layer (either in Impala/Spark SQL level, or at kudu API level) to manage business data object or entity and its related images/documents. We are thinking to maybe to find ways to extend Kudu to support large files, either through the current Binary data type, which there are size limitations (64K) due to known issues, or maybe introduce new data type like BLOB for storing images or documents that have sizes from a few hundred KBs to a few MBs, or extend Kudu API to store the files into a file system (which might be more suitable for even larger files). Many relational DB or NoSQL DB have different levels of support, or different design, like HBase, Cassandra, MapR-DB etc. I’d like ask your feedback or opinions: 1). Do you have a need to store larger content (like image or documents) into Kudu (in MBs level)? 2). Do you have any opinions on storing the large content inside the database or in file system? Much appreciated your comments. Thanks!
