I prefer the alternative
and if no parameter is set, just keep it stupid simple: assume UTF-8

IMHO, this will give good results and will be easy to explain

anything more complex is harder to maintain and to explain in case magic does 
not do what was dreamt of

Regards,

Hervé

Le vendredi 14 novembre 2014 18:43:02 Michael Osipov a écrit :
> Am 2014-11-14 um 18:07 schrieb Hervé BOUTEMY:
> > [..]
> > 
> >> The parameter won't help if there are several licenses with several
> >> encodings used.
> > 
> > looks like the parameter can be either simple or complex: need a syntax
> > 
> > or just ignore: is it theory or reality?
> 
> Pure theory.
> 
> My approach would be this:
> 
> provide a license paramter: licenseEncoding
> 
> 1. Obtain the content type
> 2. Check whether is contains charset qualifier, yes use, use that
> 3. If not check whether this is an HTML file and pass to JSoup (do magic)
> 4. If nothing else can be determined use the parameter
> 5. If paremeter is not set, assume UTF-8
> 
> Alternative:
> 
> 1. If paremeter is set, use that regardless of the rest
> 2. If not, continue with first approach and omit 4
> 
> WDYT?
> 
> Michael
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to