> > Hasn't someone already fixed this problem?  If there isn't a CPAN module
> to
> > perform standardized bibliographic reference formatting/parsing.  I
> haven't
> > looked at CPAN; did either of you?  If a CPAN module doesn't exist, one
> > should!
> >
>
> What standard?
>
> Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY.
>
>
> Or
>
>
> > Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D,
> > Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The
> > 2nd:4th digit ratio, sexual dimorphism, population differences, and
> > reproductive success. evidence for sexually antagonistic genes? Evol Hum
> > Behav. 21(3):163-183.
>
>
> Or
>
>
> > Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K.,
> > Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C.,
> Onofrio,
> > R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T.,
> Parkin,
> > M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher,
> S.,
> > Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J.,
> Kitabayashi,
> > N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M.,
> Golub,
> > T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M., &
> Garraway,
> > L. (2011). The genomic complexity of primary human prostate cancer
> Nature,
> > 470 (7333), 214-220 DOI: 10.1038/nature09744
>
>
> ?
>
> If there's a standard, then sure, someone has probably put that into CPAN.
> The problem is that I don't think that there is, though I'd be glad to be
> proven wrong.
>
>

> > What I want to be able to do eventually is parse each name separately and
> > associate that with the title. I am not sure how yet, but I haven't even
> > got
> > there.
> >
> >
> That can range from pretty simple to fairly complex, depending on how much
> you want to squeeze out of that relationship. If you just want to be able
> to
> say "Morgan, M.J wrote an article for X journal, titled Y", then that's
> just
> a hash (of hashes), and you need to look no further than this mail. But if
> you also want to say, "Journal X has these authors. One of them is Wilson,
> C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and
> whose primary author is Morgan, M.J.", then hashes will probably not cut it
> anymore (a cyclical hash of hashes might do, but that's pretty tough to
> handle, and _very_ rough on the eyes). You'll probably want an object model
> there, or some database interaction.
>
> But we are getting ahead of ourselves for now :)
>
>
I figured that eventually it would be easier to somehow pass the results
into mySQL tables, but I left that bridge to be crossed once I get there.


>
>
> > It works fine for the first name, but as expected if @entries contain
> > several strings with authors names (I did that by matching the year and
> > storing $` in the @entries) it will match the first author and it will go
> > to
> > the next $entries. Is there a way to match the pattern more than once,
> but
> > to store each match separately?
> >
>
> You are looking for the /g switch. You can look it up in perlretut[0].
>
>
I actually remember reading on the Llama book that the /g modifier could be
use with m// also and not only with s/// and thinking but when would you
need it with m//. :)


> For example, would I be able to store
> > Morgan, M.J. as one item in an array and Wilson, C.E. as another one?
> >
> >
> >
> Sure. the my @names = ... from above will suffice for that. But chances are
> you want more than that - In general, you have two options. Either you make
> several small regexes to extract the data piece by piece, or you create a
> grammar to do the job for you. For the latter, there's two main options: a
> (?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or
> you pull out Regexp::Grammars from CPAN. They are pretty similar, but
> Regexp::Grammars is much more powerful, letting you access the full parse
> tree - so what I'll have to do in two steps in the next snippet, R::G would
> do in one.
>
> Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode
> character properties[3], and a probably unnecessary lookbehind[1] in the
> split by the end. I made some arbitrary assumptions on the data, like
> saying
> that a title can't be longer than 52 characters, or can't have a period in
> it, or that the journal's name can't have digits in it, which I suppose is
> a
> tad disingenuous, but take it as an example, not a solution : P
>
>
Thanks! This gives me a lot to read on.

Cheers,

T.



-- 
"Education is not to be used to promote obscurantism." - Theodonius
Dobzhansky.

"Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio"

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori
PhD Candidate - Ocean Science Center-Memorial University of Newfoundland

Reply via email to