+1

Regards
Ankit


On Tue, Aug 20, 2013 at 4:26 PM, Kasper Sørensen <
[email protected]> wrote:

> I've updated my gist/patch [1] with also support for using quotes in
> the table/column paths. Let's have a vote on this patch, to see if we
> can get this in.
>
> [1] https://gist.github.com/kaspersorensen/6210970
>
> 2013/8/20 Kasper Sørensen <[email protected]>:
> > Agreed on all. Except why should dots in column names be any different
> > than schema and table names?
> >
> > 2013/8/16 Hans Drexler <[email protected]>:
> >> I believe that probably, *every* convention will have its drawbacks.
> using a factory can help on one hand, but it can also cause great confusion
> if things get mixed. It also makes things more complex. If we clearly
> document the choice made, I will live with that.
> >>
> >> My main point is that  we should try to write and document the software
> in such way that MetaModel users will not get confused. I like the quotes
> idea, since that will allow the user to explicitely express what is
> intended. But then, lets extend it to something like this:
> >>
> >> "schema_name"."table_name"."column_name"
> >>
> >> Where schema_name and table_name can contain dots (".").   (I guess
> column names cannot...)
> >>
> >> I hope you don't mind me rambling about this...
> >>
> >> kind regards,
> >>
> >> Hans
> >>
> >> -----Original Message-----
> >> From: Kasper Sørensen [mailto:[email protected]]
> >> Sent: Wednesday, August 14, 2013 2:59 PM
> >> To: [email protected]
> >> Subject: Re: [DISCUSS] use folder name as schema name for file based
> DataContexts
> >>
> >> With those different preferences, we could even consider making
> something like a "TableNameFactory" which converts filenames into table
> names. But I guess the crucial point is which default convention to use.
> >>
> >> Underscoring makes it a bit cleaner to look at the column or table
> paths, but it also makes the representation less direct. A user could start
> wondering if there are other characters than dots that will be replaced by
> underscores etc.
> >>
> >> It should be noted that MM's parser does support dots in both table and
> schema names, so this is probably mostly a question of aesthetics.
> >>
> >> The ambiguity that you point out is also interesting. So far I haven't
> seen it appear in real life, but technically it could occur that you had
> two pairs of schemas and tables that would generate a ambigious table path.
> For instance:
> >>
> >> Schema: foo.bar
> >> Table: baz
> >>
> >> and
> >>
> >> Schema: foo
> >> Table: bar.baz
> >>
> >> The parser would currently favor the second schema ("foo") since it
> incrementally tries for schema/table/column matches with every
> dot-separated token. An improvement to the parser would be to allow quote
> characters, so that you could express your table path like this
> >> then:
> >>
> >> "foo.bar".baz
> >>
> >> Also I want to note that some databases do support dots in
> schema/table/column names, so this ambiguity can (although rarely) also
> occur in a RDBMS or other data sources. It would also be quite common with
> some separator (not necesarily a dot) in NoSQL database column names, to
> indicate a nested field. In HBase for instance they are referred using
> colon, like this: "columnFamily:column".
> >>
> >> All in all I am mostly feeling like preserving the dots from the
> filenames, but am also very curious what other people think!
> >>
> >> 2013/8/14 Hans Drexler <[email protected]>:
> >>> Hi,
> >>>
> >>> First I agree with bumping this issue. When at the customer, this
> thing caused a lot of time spent in figuring out what was going on. I am
> not sure if I like the extension as part of the table name, because:
> >>> - I would never create a table in a relational database with a dot in
> >>> the name
> >>> - It creates a ambiguity. If you have a "full" path name to a column,
> like " documents.people.csv.name ", then it is not clear if the schema
> name is "documents.people" and the table name is "csv", or that the schema
> name is "documents" and the table name is "people.csv". It seems natural to
> me that schema names contain dots, but not table names.
> >>>
> >>> Alternatives:
> >>> - Leave the extension out of the name (probably not acceptable,
> because then you can no longer have two "tables" differing only in
> extension). Although I must say that personally I think this would be the
> best solution.
> >>>
> >>> - Use a conventional name, like:
> >>> Schema name: Folder name
> >>> Table name: The filename, including extension (all dots replaced by
> underscores).
> >>> Resulting in e.g. a column path like this:
> >>> documents.people_csv.name
> >>>
> >>> At the customer site, the file I needed to use was actually called
> like this pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, this
> would become:
> >>> bar.FOO_PEOPLE_IN_FILE
> >>>
> >>> IMHO this is preferable to  "bar.foo.people.in.file"
> >>>
> >>> The problem is of course that it would now be impossible to have
> >>> another file "bar/FOO_PEOPLE_IN_FILE" :-(
> >>>
> >>> I am happy to hear other peoples thougths.
> >>>
> >>>
> >>> Hans
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Kasper Sørensen [mailto:[email protected]]
> >>> Sent: Wednesday, August 14, 2013 10:18 AM
> >>> To: [email protected]
> >>> Subject: Re: [DISCUSS] use folder name as schema name for file based
> >>> DataContexts
> >>>
> >>> Rats, made a mistake in that diff. The Gist has been updated [1] and
> now contains the ResourceUtils class which was missing before.
> >>> [1] https://gist.github.com/kaspersorensen/6210970
> >>>
> >>> 2013/8/12 Kasper Sørensen <[email protected]>:
> >>>> Here's a proposed patch (implemented for CSV and fixedwidth files
> >>>> which are the modules that implemented the old schema naming pattern):
> >>>> https://gist.github.com/kaspersorensen/6210970
> >>>>
> >>>> 2013/8/10 Kasper Sørensen <[email protected]>:
> >>>>> https://issues.apache.org/jira/browse/METAMODEL-4
> >>>>>
> >>>>> 2013/8/10 Henry Saputra <[email protected]>:
> >>>>>> What is the JIRA for this one?
> >>>>>>
> >>>>>>
> >>>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg <
> >>>>>> [email protected]> wrote:
> >>>>>>
> >>>>>>> +1
> >>>>>>>
> >>>>>>> (shouldn't I just vote on the Jira for this?)
> >>>>>>>
> >>>>>>> manuel
> >>>>>>>
> >>>>>>> > -----Original Message-----
> >>>>>>> > From: Kasper Sørensen [mailto:[email protected]]
> >>>>>>> > Sent: Friday, August 09, 2013 9:03
> >>>>>>> > To: [email protected]
> >>>>>>> > Subject: Re: [DISCUSS] use folder name as schema name for file
> >>>>>>> > based DataContexts
> >>>>>>> >
> >>>>>>> > Allow me to bump this issue (it's my impression that more people
> >>>>>>> > have
> >>>>>>> joined
> >>>>>>> > in a bit late, after this topic was posted).
> >>>>>>> >
> >>>>>>> > I think this is one of the more important issues that I would
> >>>>>>> > want to fix before we make our first release at Apache.
> >>>>>>> >
> >>>>>>> > 2013/7/24 Kasper Sørensen <[email protected]>:
> >>>>>>> > > Right now we have this slightly odd naming convention for
> >>>>>>> > > schema and table names when building metadata for e.g. a CSV
> >>>>>>> > > file or a fixed width value file.
> >>>>>>> > >
> >>>>>>> > > Schema name: The filename, including file extension.
> >>>>>>> > > Table name: The filename without extension.
> >>>>>>> > > Resulting in e.g. a column path like this:
> >>>>>>> > > people.csv.people.name
> >>>>>>> > >
> >>>>>>> > > I suggest we change it to this convention:
> >>>>>>> > >
> >>>>>>> > > Schema name: Folder name
> >>>>>>> > > Table name: The filename, including file extension.
> >>>>>>> > > Resulting in e.g. a column path like this:
> >>>>>>> > > documents.people.csv.name
> >>>>>>> > >
> >>>>>>> > > Why do I think this would be an improvement?
> >>>>>>> > >
> >>>>>>> > > 1) Because this would first of all make a kind of sense to the
> >>>>>>> > > user to see the file system's hierarchy reflected in the
> schema model.
> >>>>>>> > > 2) Because it allows us to make these DataContext's operate
> >>>>>>> > > not on a single file, but on a directory of files. I have seen
> >>>>>>> > > this quite a number of times by now that users of MetaModel,
> or users of e.g.
> >>>>>>> > > DataCleaner, which uses MetaModel quite heavily, wants to do
> >>>>>>> > > this sort
> >>>>>>> of
> >>>>>>> > stuff.
> >>>>>>> > > 3) The removing of the file extension stuff is kind of broken
> >>>>>>> > > and a strange convention in the first place.
> >>>>>>> > >
> >>>>>>> > > While this doesn't really break backwards compatibility in
> >>>>>>> > > terms of Java code, it would break configuration files and
> >>>>>>> > > other stuff of applications that use MetaModel. But I do
> >>>>>>> > > believe that can be communicated and handled through carefully
> >>>>>>> > > explaining the new convention on the migration page (that I
> recently started writing [1]).
> >>>>>>> > >
> >>>>>>> > > What do you think?
> >>>>>>> > >
> >>>>>>> > > [1]
> >>>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaMode
> >>>>>>> > > l
> >>>>>>>
>

Reply via email to