Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2804#discussion_r228413841
--- Diff:
store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonSchemaReader.java
---
@@ -59,11 +60,30 @@ public static Schema readSchemaInSchemaFile(String
schemaFilePath) throws IOExce
/**
* Read carbondata file and return the schema
*
- * @param dataFilePath complete path including carbondata file name
+ * @param path complete path including carbondata file name
* @return Schema object
* @throws IOException
*/
- public static Schema readSchemaInDataFile(String dataFilePath) throws
IOException {
+ public static Schema readSchemaInDataFile(String path) throws
IOException {
+ String dataFilePath = path;
+ if (!(dataFilePath.contains(".carbondata"))) {
+ CarbonFile[] carbonFiles = FileFactory
+ .getCarbonFile(path)
+ .listFiles(new CarbonFileFilter() {
+ @Override
+ public boolean accept(CarbonFile file) {
+ if (file == null) {
+ return false;
+ }
+ return file.getName().endsWith(".carbondata");
+ }
+ });
+ if (carbonFiles == null || carbonFiles.length < 1) {
+ throw new RuntimeException("Carbon data file not exists.");
+ }
+ dataFilePath = carbonFiles[0].getAbsolutePath();
--- End diff --
Taking only one data file (first file) ?
What if this folder has multiple files with different schema. what if user
wanted schema info from file also?
Supporting schema read from folder is not required as this is exposed for
user and he has the list of files.
a) to read one file, user passes single file for this API. -- already
supported
b) to read multiple files, user can list files and pass all the files he
want schema and call our API in a list -- already supported.
Just reading first file from folder doesn't make sense. This PR is not
required as existing API already support all user scenarios.
---