+1 Regards Ankit
On Tue, Aug 20, 2013 at 4:26 PM, Kasper Sørensen < [email protected]> wrote: > I've updated my gist/patch [1] with also support for using quotes in > the table/column paths. Let's have a vote on this patch, to see if we > can get this in. > > [1] https://gist.github.com/kaspersorensen/6210970 > > 2013/8/20 Kasper Sørensen <[email protected]>: > > Agreed on all. Except why should dots in column names be any different > > than schema and table names? > > > > 2013/8/16 Hans Drexler <[email protected]>: > >> I believe that probably, *every* convention will have its drawbacks. > using a factory can help on one hand, but it can also cause great confusion > if things get mixed. It also makes things more complex. If we clearly > document the choice made, I will live with that. > >> > >> My main point is that we should try to write and document the software > in such way that MetaModel users will not get confused. I like the quotes > idea, since that will allow the user to explicitely express what is > intended. But then, lets extend it to something like this: > >> > >> "schema_name"."table_name"."column_name" > >> > >> Where schema_name and table_name can contain dots ("."). (I guess > column names cannot...) > >> > >> I hope you don't mind me rambling about this... > >> > >> kind regards, > >> > >> Hans > >> > >> -----Original Message----- > >> From: Kasper Sørensen [mailto:[email protected]] > >> Sent: Wednesday, August 14, 2013 2:59 PM > >> To: [email protected] > >> Subject: Re: [DISCUSS] use folder name as schema name for file based > DataContexts > >> > >> With those different preferences, we could even consider making > something like a "TableNameFactory" which converts filenames into table > names. But I guess the crucial point is which default convention to use. > >> > >> Underscoring makes it a bit cleaner to look at the column or table > paths, but it also makes the representation less direct. A user could start > wondering if there are other characters than dots that will be replaced by > underscores etc. > >> > >> It should be noted that MM's parser does support dots in both table and > schema names, so this is probably mostly a question of aesthetics. > >> > >> The ambiguity that you point out is also interesting. So far I haven't > seen it appear in real life, but technically it could occur that you had > two pairs of schemas and tables that would generate a ambigious table path. > For instance: > >> > >> Schema: foo.bar > >> Table: baz > >> > >> and > >> > >> Schema: foo > >> Table: bar.baz > >> > >> The parser would currently favor the second schema ("foo") since it > incrementally tries for schema/table/column matches with every > dot-separated token. An improvement to the parser would be to allow quote > characters, so that you could express your table path like this > >> then: > >> > >> "foo.bar".baz > >> > >> Also I want to note that some databases do support dots in > schema/table/column names, so this ambiguity can (although rarely) also > occur in a RDBMS or other data sources. It would also be quite common with > some separator (not necesarily a dot) in NoSQL database column names, to > indicate a nested field. In HBase for instance they are referred using > colon, like this: "columnFamily:column". > >> > >> All in all I am mostly feeling like preserving the dots from the > filenames, but am also very curious what other people think! > >> > >> 2013/8/14 Hans Drexler <[email protected]>: > >>> Hi, > >>> > >>> First I agree with bumping this issue. When at the customer, this > thing caused a lot of time spent in figuring out what was going on. I am > not sure if I like the extension as part of the table name, because: > >>> - I would never create a table in a relational database with a dot in > >>> the name > >>> - It creates a ambiguity. If you have a "full" path name to a column, > like " documents.people.csv.name ", then it is not clear if the schema > name is "documents.people" and the table name is "csv", or that the schema > name is "documents" and the table name is "people.csv". It seems natural to > me that schema names contain dots, but not table names. > >>> > >>> Alternatives: > >>> - Leave the extension out of the name (probably not acceptable, > because then you can no longer have two "tables" differing only in > extension). Although I must say that personally I think this would be the > best solution. > >>> > >>> - Use a conventional name, like: > >>> Schema name: Folder name > >>> Table name: The filename, including extension (all dots replaced by > underscores). > >>> Resulting in e.g. a column path like this: > >>> documents.people_csv.name > >>> > >>> At the customer site, the file I needed to use was actually called > like this pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, this > would become: > >>> bar.FOO_PEOPLE_IN_FILE > >>> > >>> IMHO this is preferable to "bar.foo.people.in.file" > >>> > >>> The problem is of course that it would now be impossible to have > >>> another file "bar/FOO_PEOPLE_IN_FILE" :-( > >>> > >>> I am happy to hear other peoples thougths. > >>> > >>> > >>> Hans > >>> > >>> > >>> -----Original Message----- > >>> From: Kasper Sørensen [mailto:[email protected]] > >>> Sent: Wednesday, August 14, 2013 10:18 AM > >>> To: [email protected] > >>> Subject: Re: [DISCUSS] use folder name as schema name for file based > >>> DataContexts > >>> > >>> Rats, made a mistake in that diff. The Gist has been updated [1] and > now contains the ResourceUtils class which was missing before. > >>> [1] https://gist.github.com/kaspersorensen/6210970 > >>> > >>> 2013/8/12 Kasper Sørensen <[email protected]>: > >>>> Here's a proposed patch (implemented for CSV and fixedwidth files > >>>> which are the modules that implemented the old schema naming pattern): > >>>> https://gist.github.com/kaspersorensen/6210970 > >>>> > >>>> 2013/8/10 Kasper Sørensen <[email protected]>: > >>>>> https://issues.apache.org/jira/browse/METAMODEL-4 > >>>>> > >>>>> 2013/8/10 Henry Saputra <[email protected]>: > >>>>>> What is the JIRA for this one? > >>>>>> > >>>>>> > >>>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg < > >>>>>> [email protected]> wrote: > >>>>>> > >>>>>>> +1 > >>>>>>> > >>>>>>> (shouldn't I just vote on the Jira for this?) > >>>>>>> > >>>>>>> manuel > >>>>>>> > >>>>>>> > -----Original Message----- > >>>>>>> > From: Kasper Sørensen [mailto:[email protected]] > >>>>>>> > Sent: Friday, August 09, 2013 9:03 > >>>>>>> > To: [email protected] > >>>>>>> > Subject: Re: [DISCUSS] use folder name as schema name for file > >>>>>>> > based DataContexts > >>>>>>> > > >>>>>>> > Allow me to bump this issue (it's my impression that more people > >>>>>>> > have > >>>>>>> joined > >>>>>>> > in a bit late, after this topic was posted). > >>>>>>> > > >>>>>>> > I think this is one of the more important issues that I would > >>>>>>> > want to fix before we make our first release at Apache. > >>>>>>> > > >>>>>>> > 2013/7/24 Kasper Sørensen <[email protected]>: > >>>>>>> > > Right now we have this slightly odd naming convention for > >>>>>>> > > schema and table names when building metadata for e.g. a CSV > >>>>>>> > > file or a fixed width value file. > >>>>>>> > > > >>>>>>> > > Schema name: The filename, including file extension. > >>>>>>> > > Table name: The filename without extension. > >>>>>>> > > Resulting in e.g. a column path like this: > >>>>>>> > > people.csv.people.name > >>>>>>> > > > >>>>>>> > > I suggest we change it to this convention: > >>>>>>> > > > >>>>>>> > > Schema name: Folder name > >>>>>>> > > Table name: The filename, including file extension. > >>>>>>> > > Resulting in e.g. a column path like this: > >>>>>>> > > documents.people.csv.name > >>>>>>> > > > >>>>>>> > > Why do I think this would be an improvement? > >>>>>>> > > > >>>>>>> > > 1) Because this would first of all make a kind of sense to the > >>>>>>> > > user to see the file system's hierarchy reflected in the > schema model. > >>>>>>> > > 2) Because it allows us to make these DataContext's operate > >>>>>>> > > not on a single file, but on a directory of files. I have seen > >>>>>>> > > this quite a number of times by now that users of MetaModel, > or users of e.g. > >>>>>>> > > DataCleaner, which uses MetaModel quite heavily, wants to do > >>>>>>> > > this sort > >>>>>>> of > >>>>>>> > stuff. > >>>>>>> > > 3) The removing of the file extension stuff is kind of broken > >>>>>>> > > and a strange convention in the first place. > >>>>>>> > > > >>>>>>> > > While this doesn't really break backwards compatibility in > >>>>>>> > > terms of Java code, it would break configuration files and > >>>>>>> > > other stuff of applications that use MetaModel. But I do > >>>>>>> > > believe that can be communicated and handled through carefully > >>>>>>> > > explaining the new convention on the migration page (that I > recently started writing [1]). > >>>>>>> > > > >>>>>>> > > What do you think? > >>>>>>> > > > >>>>>>> > > [1] > >>>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaMode > >>>>>>> > > l > >>>>>>> >
