I think the regexes have to be specified in code rather than a properties file, for a couple of reasons.
1) They are simply too complicated for many if not most users to create. 2) A regex is insufficient to do the full parsing job - a regex can tell you "look for a string of 3-7 uppercase letters here" or "look for 'RWED' or 'RWD' there" or "look for a number there" or whatnot, but it can't tell you what part of the file spec that group might belong to. That's the job of code. Given that the regexes are complicated, it makes a lot of sense to hardcode them and place them in code with their interpreters. The only way I can see to get beyond this is to devise a "language" - perhaps built on top of regexes - something along the lines of the SimpleDateFormat "language". Such a language goes a step further than regexes by mapping patterns to meanings. But parsing a file listing is a much more complicated job than parsing a date, and not necessarily something I want to tackle in my spare time. Somewhat short of this goal might be a properties file that linked particular FTP sites to parsers and date formats. ------------------------------------------------- As far as my less ambitious goal of parsing dates correctly on different locale systems is concerned, I posted a request on the Ant list for any sample FTP sites implemented in different languages and so far have not received any replies. I'll make a similar request here. If anyone has the addresses of publicly accessible ftp sites implemented in languages other than English, please pass them on to me. I am guessing, though, that a lot of the more popular public sites are implemented in English. As an experiment, I tried to access ftp.suse.de and found that to be the case. My uneducated guess at this point is that ftp servers in other languages are more likely to be found on private corporate sites that don't allow anonymous access than on public sites. -----Original Message----- From: Jeffrey D. Brekke [mailto:[EMAIL PROTECTED] Sent: Mon 3/10/2003 11:04 PM To: Jakarta Commons Developers List Cc: Subject: Re: [NET] Here's an Ant bug that we should look into fixing Steve, I'm listening and hopefully will get more time to work on Net stuff soon. So while I can't commit to work on implementing these ideas, they sound fine and I can still run tests, generate site, and commit patches. As I was reading this I remember sometime an idea where we could specify the regular expression used for a system in a properties file for something and have a generic parser that would look up the correct RE. This could then be configured outside the code itself as new systems are encountered. Maybe something like this could also be used to handle date formatting? jb >>>>> On Sun, 9 Mar 2003 14:18:18 -0600, "Steve Cohen" <[EMAIL PROTECTED]> said: > I had thought I might hear some replies to this. The silence has > been deafening. I have been thinking about the issue, though, in > particular where commons-net.ftp might have to go in order to really > implement the ambitious spec laid out for it by clients such as ant, > which have chosen to use it. > Of particular note here is the "depends" (or synonym "newer") > attribute of the ant <ftp> task. This runs aground on the issue of > parsing the date. In the first place, there are the issues of > general listing format (unix, NT, VMS, etc.). In the second place, > though, within these categories are issues of date format. This > devolves into a thicket of locale-type issues: > Does month come before date? In which language are the names of the > months coded? > To solve this, the scope of parser definition needs to be > significantly expanded. > Things might be better if there was any mechanism within the FTP > specification for the server to expose its format to a client. No > such mechanism exists, however. In fact RFC959, the FTP spec is > intentionally vague on this point: > "Since the information on a file may vary widely from system to > system, this information may be hard to use automatically in a > program, but may be quite useful to a human user." > http://www.ietf.org/rfc/rfc959.txt > In other words, FTP was never meant to be used in such an automated > fashion. > Nonetheless, with the specification of parameters easily passed in > by something like an ant task, it might be possible to define a > parser sufficiently to perform this task. These parameters include: > 1) os type of FTP server(unix, NT, OS2, VMS, etc.) 2) date format - > to define ordering of date components - "MMM dd" or "dd MMM", > etc. as in simple date format 3) locale - to define actual > abbreviations of the months. >> From 2 and 3 it is possible to build a Locale-specific >> SimpleDateFormat > capable of parsing dates on a particular system. This object > contains the names and abbreviations of the month. > This immediately raises the question of how to divvy up the parsing > duties between the regular expression and the SimpleDateFormat. It > seems as if the format string must be used to construct the part of > the regex in the correct order. Then the SimpleDateFormat would be > used to actually parse the date. All "optimizations" such as > assuming a constant character width of 3 for month abbreviations are > out the window here - they work for many languages, but not for all. > French, for example, uses periods and varying lengths. > A cautionary note: one would have to inspect actual ftp sites to > determine whether they actually the abbreviations specified in java > Locales. > Comments? Is this a Pandora's box that we don't want to open? > -----Original Message----- From: Steve Cohen Sent: Wed 3/5/2003 1:53 > PM To: Jakarta Commons Developers List Cc: Subject: [NET] Here's an > Ant bug that we should look into fixing > The <ftp> task of ant doesn't work right because we don't parse > non-english date formats. > http://nagoya.apache.org/bugzilla/show_bug.cgi?id=14333 > ----------------------------------------------- Steve Cohen > Sr. Software Engineer Sportvision Inc. [EMAIL PROTECTED] > http://www.sportvision.com > Please note: As a result of the merger of Ignite Sports and > Sportvision, my email address has changed to [EMAIL PROTECTED] -- ===================================================================== Jeffrey D. Brekke [EMAIL PROTECTED] Wisconsin, USA [EMAIL PROTECTED] [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
