Re: RFC Parsing Library

Anders Nygren Thu, 06 Oct 2011 10:45:36 -0700

It would be better if I also remembered to add the files.
So here we go again.
I also added the generated .erl file for those that do not want to
install abnfc.


/Anders

On Thu, Oct 6, 2011 at 12:41 PM, Anders Nygren <[email protected]> wrote:
> Hi
> Just for fun I made a RFC4180 parser using the ABNF in the RFC.
>
> Compile rfc4180.abnf using abnfc
>
> abnfc -binary rfc4180.abnf
>
> And You get a working csv parser.
> There are some things in the textual description in the RFC that is
> not reflected
> in the abnf, e.g. if the file has a header row or not, and the
> generated parser does not
> check that all rows has the same number of fields.
>
> /Anders
>
> On Wed, Oct 5, 2011 at 12:15 PM, Anders Nygren <[email protected]> 
> wrote:
>> On Wed, Oct 5, 2011 at 12:09 PM, Eric Merritt <[email protected]> wrote:
>>> On 10/05/11 at 11:48am, Anders Nygren wrote:
>>>> Eric
>>>> In case You are not already aware of it, I would like to point You to
>>>> my ABNF parser
>>>> generator, abnfc, (most RFCs are specified using ABNF)
>>>
>>> I wasn't aware of it at all. Thats good to know. I probably will not
>>> go back and rewrite the existing stuff. However, I will absolutely
>>> keep it in mind for the future.
>>>
>>>> https://github.com/nygge/abnfc
>>>>
>>>> The documentation is nonexistent, but it was presented at EUC 2008
>>>>
>>>> http://www.erlang.org/euc/08/1500Nygren2.pdf
>>>>
>>>> There has been some changes after that, so contact me if You have any
>>>> questions/problems.
>>>>
>>>> In general I am not sure it is a good idea to put all RFC parsers in one 
>>>> erlang
>>>> application. Since after a while You may get a large number of unrelated
>>>> stuff.
>>>
>>> I am always on the fence about this. I am not a big fan of 1 erlang
>>> file OTP apps, though there is nothing intrinsically wrong with
>>> that. I try to group modules in some reasonable way. In this case, I
>>> think that the fact that this is a library application (ie static)
>>> implies that there is no system overhead in having all the RFCs
>>> there. Though as you say there maybe some conceptual overhead.
>>>
>>
>> Thinking a little more about this, it is not so bad, since with reltool
>> it is fairly simple to specify which modules to include from an application
>> when creating a release.
>>
>> /Anders
>>
>>>
>>>> But on the other hand many RFCs imports parts of other RFCs so I do
>>>> not have any good ideas on how to structure it.
>>>
>>> Right now its just going be structured by name. for example each RFC
>>> implementation will be named erfc_<rfc number>.erl. For example, 8120
>>> would be named erfc_8120.erl. etc.
>>>
>>>>
>>>> Maybe the best is just to make a big bag of parsers and let people pick 
>>>> the ones
>>>> they need.
>>>
>>> That is exactly the current plan. :)
>>>
>>> Thanks Anders, your input is always rock solid.
>>>>
>>>> /Anders
>>>>
>>>> On Wed, Oct 5, 2011 at 10:48 AM, Eric Merritt <[email protected]> 
>>>> wrote:
>>>> > Hello All,
>>>> >
>>>> > I have started an RFC parsing library in support of my latest
>>>> > endeavor. This seems like a target to put under erlware. I thought I
>>>> > would float the idea to the list.
>>>> >
>>>> > https://github.com/ericbmerritt/erfc_parsers
>>>> >
>>>> > I dont have a specific goal of implementing all RFCs but as I need
>>>> > them (RFC 821 should show up shortly) I will add them.
>>>> >
>>>> > What does everone think of putting them under erlware?
>>>> >
>>>> > Eric
>>>> >
>>>> > --
>>>> > You received this message because you are subscribed to the Google 
>>>> > Groups "erlware-dev" group.
>>>> > To post to this group, send email to [email protected].
>>>> > To unsubscribe from this group, send email to 
>>>> > [email protected].
>>>> > For more options, visit this group at 
>>>> > http://groups.google.com/group/erlware-dev?hl=en.
>>>> >
>>>> >
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "erlware-dev" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to 
>>>> [email protected].
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/erlware-dev?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "erlware-dev" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/erlware-dev?hl=en.
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"erlware-dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/erlware-dev?hl=en.

rfc4180.abnf
Description: Binary data

rfc4180.hrl
Description: Binary data

%%% Do not modify this file, it is automatically generated by abnfc.
%%% All changes will be lost when it is regenerated.
%%% Generated by abnfc_gen on 2011-10-06 12:36:08

-module(rfc4180).

-export(['COMMA'/0, 'CR'/0, 'CRLF'/0, 'DDQUOTE'/0,
	 'DQUOTE'/0, 'LF'/0, 'TEXTDATA'/0, decode/2, escaped/0,
	 field/0, file/0, header/0, name/0, 'non-escaped'/0,
	 record/0]).

-include("rfc4180.hrl").

decode(file, Str) -> (file())(Str);
decode(header, Str) -> (header())(Str);
decode(record, Str) -> (record())(Str);
decode(name, Str) -> (name())(Str);
decode(field, Str) -> (field())(Str);
decode(escaped, Str) -> (escaped())(Str);
decode('non-escaped', Str) -> ('non-escaped'())(Str);
decode('COMMA', Str) -> ('COMMA'())(Str);
decode('CR', Str) -> ('CR'())(Str);
decode('DDQUOTE', Str) -> ('DDQUOTE'())(Str);
decode('DQUOTE', Str) -> ('DQUOTE'())(Str);
decode('LF', Str) -> ('LF'())(Str);
decode('CRLF', Str) -> ('CRLF'())(Str);
decode('TEXTDATA', Str) -> ('TEXTDATA'())(Str).

file() ->
    fun (T) ->
	    __P = '__seq'(['__repeat'(0, 1,
				      '__seq'([header(), 'CRLF'()])),
			   record(),
			   '__repeat'(0, infinity, '__seq'(['CRLF'(), record()])),
			   '__repeat'(0, 1, 'CRLF'())]),
	    case __P(T) of
	      {ok, [_YY1, _YY2, _YY3, _YY4] = _YY, _T1} ->
		  try #file{header = hd(hd(_YY1)),
			    records = [_YY2 | [Rec || [_, Rec] <- _YY3]]}
		  of
		    __Ret -> {ok, __Ret, _T1}
		  catch
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end.

header() ->
    fun (T) ->
	    __P = '__seq'([name(),
			   '__repeat'(0, infinity, '__seq'(['COMMA'(), name()]))]),
	    case __P(T) of
	      {ok, [_YY1, _YY2] = _YY, _T1} ->
		  try #header{names =
				  [_YY1 | [Name || [_, Name] <- _YY2]]}
		  of
		    __Ret -> {ok, __Ret, _T1}
		  catch
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end.

record() ->
    fun (T) ->
	    __P = '__seq'([field(),
			   '__repeat'(0, infinity,
				      '__seq'(['COMMA'(), field()]))]),
	    case __P(T) of
	      {ok, [_YY1, _YY2] = _YY, _T1} ->
		  try #record{fields =
				  [_YY1 | [Value || [_, Value] <- _YY2]]}
		  of
		    __Ret -> {ok, __Ret, _T1}
		  catch
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end.

name() -> fun (T) -> __P = field(), __P(T) end.

field() ->
    fun (T) ->
	    __P = '__alt'([escaped(), 'non-escaped'()]), __P(T)
    end.

escaped() ->
    fun (T) ->
	    __P = '__seq'(['DQUOTE'(),
			   '__repeat'(0, infinity,
				      '__alt'(['TEXTDATA'(), 'COMMA'(), 'CR'(), 'LF'(),
					       'DDQUOTE'()])),
			   'DQUOTE'()]),
	    case __P(T) of
	      {ok, [_YY1, _YY2, _YY3] = _YY, _T1} ->
		  try _YY2 of
		    __Ret -> {ok, __Ret, _T1}
		  catch
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end.

'non-escaped'() ->
    fun (T) ->
	    __P = '__repeat'(0, infinity, 'TEXTDATA'()), __P(T)
    end.

'COMMA'() ->
    fun (T) ->
	    __P = fun (<<44, Tl/binary>>) -> {ok, 44, Tl};
		      (_) -> fail
		  end,
	    __P(T)
    end.

'CR'() ->
    fun (T) ->
	    __P = fun (<<13, Tl/binary>>) -> {ok, 13, Tl};
		      (_) -> fail
		  end,
	    __P(T)
    end.

'DDQUOTE'() ->
    fun (T) ->
	    __P = '__repeat'(2, 2, 'DQUOTE'()),
	    case __P(T) of
	      {ok, _YY, _T1} ->
		  try $" of
		    __Ret -> {ok, __Ret, _T1}
		  catch
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end.

'DQUOTE'() ->
    fun (T) ->
	    __P = fun (<<34, Tl/binary>>) -> {ok, 34, Tl};
		      (_) -> fail
		  end,
	    __P(T)
    end.

'LF'() ->
    fun (T) ->
	    __P = fun (<<10, Tl/binary>>) -> {ok, 10, Tl};
		      (_) -> fail
		  end,
	    __P(T)
    end.

'CRLF'() ->
    fun (T) ->
	    __P = '__alt'(['__seq'(['CR'(), 'LF'()]), 'LF'()]),
	    __P(T)
    end.

'TEXTDATA'() ->
    fun (T) ->
	    __P = fun (<<C, Tl/binary>>)
			  when (C >= 32) and (C =< 33) ->
			  {ok, C, Tl};
		      (<<C, Tl/binary>>) when (C >= 35) and (C =< 43) ->
			  {ok, C, Tl};
		      (<<C, Tl/binary>>) when (C >= 45) and (C =< 126) ->
			  {ok, C, Tl};
		      (_) -> fail
		  end,
	    __P(T)
    end.

'__alt'([P | Ps]) ->
    fun (T) ->
	    case P(T) of
	      {ok, _R, _T1} = Res -> Res;
	      fail ->
		  case Ps of
		    [] -> fail;
		    _ -> ('__alt'(Ps))(T)
		  end
	    end
    end.

'__repeat'(Min, Max, P) -> '__repeat'(Min, Max, P, 0).

'__repeat'(Min, Max, P, Found) ->
    fun (T) ->
	    case P(T) of
	      {ok, R1, T1} when Max == Found + 1 -> {ok, [R1], T1};
	      {ok, R1, T1} ->
		  case ('__repeat'(Min, Max, P, Found + 1))(T1) of
		    {ok, R2, T2} -> {ok, [R1 | R2], T2};
		    fail when Found >= Min -> {ok, [R1], T1};
		    fail -> fail
		  end;
	      fail when Found >= Min -> {ok, [], T};
	      fail -> fail
	    end
    end.

'__seq'([P | Ps]) ->
    fun (T) ->
	    case P(T) of
	      {ok, R1, T1} ->
		  case ('__seq'(Ps))(T1) of
		    {ok, R2, T2} -> {ok, [R1 | R2], T2};
		    fail -> fail
		  end;
	      fail -> fail
	    end
    end;
'__seq'([]) -> fun (T) -> {ok, [], T} end.

Re: RFC Parsing Library

Reply via email to