Agreed on all. Except why should dots in column names be any different than schema and table names?
2013/8/16 Hans Drexler <[email protected]>: > I believe that probably, *every* convention will have its drawbacks. using a > factory can help on one hand, but it can also cause great confusion if things > get mixed. It also makes things more complex. If we clearly document the > choice made, I will live with that. > > My main point is that we should try to write and document the software in > such way that MetaModel users will not get confused. I like the quotes idea, > since that will allow the user to explicitely express what is intended. But > then, lets extend it to something like this: > > "schema_name"."table_name"."column_name" > > Where schema_name and table_name can contain dots ("."). (I guess column > names cannot...) > > I hope you don't mind me rambling about this... > > kind regards, > > Hans > > -----Original Message----- > From: Kasper Sørensen [mailto:[email protected]] > Sent: Wednesday, August 14, 2013 2:59 PM > To: [email protected] > Subject: Re: [DISCUSS] use folder name as schema name for file based > DataContexts > > With those different preferences, we could even consider making something > like a "TableNameFactory" which converts filenames into table names. But I > guess the crucial point is which default convention to use. > > Underscoring makes it a bit cleaner to look at the column or table paths, but > it also makes the representation less direct. A user could start wondering if > there are other characters than dots that will be replaced by underscores etc. > > It should be noted that MM's parser does support dots in both table and > schema names, so this is probably mostly a question of aesthetics. > > The ambiguity that you point out is also interesting. So far I haven't seen > it appear in real life, but technically it could occur that you had two pairs > of schemas and tables that would generate a ambigious table path. For > instance: > > Schema: foo.bar > Table: baz > > and > > Schema: foo > Table: bar.baz > > The parser would currently favor the second schema ("foo") since it > incrementally tries for schema/table/column matches with every dot-separated > token. An improvement to the parser would be to allow quote characters, so > that you could express your table path like this > then: > > "foo.bar".baz > > Also I want to note that some databases do support dots in > schema/table/column names, so this ambiguity can (although rarely) also occur > in a RDBMS or other data sources. It would also be quite common with some > separator (not necesarily a dot) in NoSQL database column names, to indicate > a nested field. In HBase for instance they are referred using colon, like > this: "columnFamily:column". > > All in all I am mostly feeling like preserving the dots from the filenames, > but am also very curious what other people think! > > 2013/8/14 Hans Drexler <[email protected]>: >> Hi, >> >> First I agree with bumping this issue. When at the customer, this thing >> caused a lot of time spent in figuring out what was going on. I am not sure >> if I like the extension as part of the table name, because: >> - I would never create a table in a relational database with a dot in >> the name >> - It creates a ambiguity. If you have a "full" path name to a column, like " >> documents.people.csv.name ", then it is not clear if the schema name is >> "documents.people" and the table name is "csv", or that the schema name is >> "documents" and the table name is "people.csv". It seems natural to me that >> schema names contain dots, but not table names. >> >> Alternatives: >> - Leave the extension out of the name (probably not acceptable, because then >> you can no longer have two "tables" differing only in extension). Although I >> must say that personally I think this would be the best solution. >> >> - Use a conventional name, like: >> Schema name: Folder name >> Table name: The filename, including extension (all dots replaced by >> underscores). >> Resulting in e.g. a column path like this: >> documents.people_csv.name >> >> At the customer site, the file I needed to use was actually called like this >> pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, this would become: >> bar.FOO_PEOPLE_IN_FILE >> >> IMHO this is preferable to "bar.foo.people.in.file" >> >> The problem is of course that it would now be impossible to have >> another file "bar/FOO_PEOPLE_IN_FILE" :-( >> >> I am happy to hear other peoples thougths. >> >> >> Hans >> >> >> -----Original Message----- >> From: Kasper Sørensen [mailto:[email protected]] >> Sent: Wednesday, August 14, 2013 10:18 AM >> To: [email protected] >> Subject: Re: [DISCUSS] use folder name as schema name for file based >> DataContexts >> >> Rats, made a mistake in that diff. The Gist has been updated [1] and now >> contains the ResourceUtils class which was missing before. >> [1] https://gist.github.com/kaspersorensen/6210970 >> >> 2013/8/12 Kasper Sørensen <[email protected]>: >>> Here's a proposed patch (implemented for CSV and fixedwidth files >>> which are the modules that implemented the old schema naming pattern): >>> https://gist.github.com/kaspersorensen/6210970 >>> >>> 2013/8/10 Kasper Sørensen <[email protected]>: >>>> https://issues.apache.org/jira/browse/METAMODEL-4 >>>> >>>> 2013/8/10 Henry Saputra <[email protected]>: >>>>> What is the JIRA for this one? >>>>> >>>>> >>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg < >>>>> [email protected]> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> (shouldn't I just vote on the Jira for this?) >>>>>> >>>>>> manuel >>>>>> >>>>>> > -----Original Message----- >>>>>> > From: Kasper Sørensen [mailto:[email protected]] >>>>>> > Sent: Friday, August 09, 2013 9:03 >>>>>> > To: [email protected] >>>>>> > Subject: Re: [DISCUSS] use folder name as schema name for file >>>>>> > based DataContexts >>>>>> > >>>>>> > Allow me to bump this issue (it's my impression that more people >>>>>> > have >>>>>> joined >>>>>> > in a bit late, after this topic was posted). >>>>>> > >>>>>> > I think this is one of the more important issues that I would >>>>>> > want to fix before we make our first release at Apache. >>>>>> > >>>>>> > 2013/7/24 Kasper Sørensen <[email protected]>: >>>>>> > > Right now we have this slightly odd naming convention for >>>>>> > > schema and table names when building metadata for e.g. a CSV >>>>>> > > file or a fixed width value file. >>>>>> > > >>>>>> > > Schema name: The filename, including file extension. >>>>>> > > Table name: The filename without extension. >>>>>> > > Resulting in e.g. a column path like this: >>>>>> > > people.csv.people.name >>>>>> > > >>>>>> > > I suggest we change it to this convention: >>>>>> > > >>>>>> > > Schema name: Folder name >>>>>> > > Table name: The filename, including file extension. >>>>>> > > Resulting in e.g. a column path like this: >>>>>> > > documents.people.csv.name >>>>>> > > >>>>>> > > Why do I think this would be an improvement? >>>>>> > > >>>>>> > > 1) Because this would first of all make a kind of sense to the >>>>>> > > user to see the file system's hierarchy reflected in the schema >>>>>> > > model. >>>>>> > > 2) Because it allows us to make these DataContext's operate >>>>>> > > not on a single file, but on a directory of files. I have seen >>>>>> > > this quite a number of times by now that users of MetaModel, or >>>>>> > > users of e.g. >>>>>> > > DataCleaner, which uses MetaModel quite heavily, wants to do >>>>>> > > this sort >>>>>> of >>>>>> > stuff. >>>>>> > > 3) The removing of the file extension stuff is kind of broken >>>>>> > > and a strange convention in the first place. >>>>>> > > >>>>>> > > While this doesn't really break backwards compatibility in >>>>>> > > terms of Java code, it would break configuration files and >>>>>> > > other stuff of applications that use MetaModel. But I do >>>>>> > > believe that can be communicated and handled through carefully >>>>>> > > explaining the new convention on the migration page (that I recently >>>>>> > > started writing [1]). >>>>>> > > >>>>>> > > What do you think? >>>>>> > > >>>>>> > > [1] >>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaMode >>>>>> > > l >>>>>>
